CARLESON MEASURES, TREES, EXTRAPOLATION, AND T(6) 

THEOREMS 



P. AUSCHER, S. HOFMANN, C. MUSCALU, T. TAO, AND C. THIELE 

Abstract. The theory of Carleson measures, stopping time arguments, and atomic 
decompositions has been weh-estabhshed in harmonic analysis. More recent is the theory 
of phase space analysis from the point of view of wave packets on tiles, tree selection 
algorithms, and tree size estimates. The purpose of this paper is to demonstrate that 
the two theories are in fact closely related, by taking existing results and reproving them 
in a unified setting. In particular we give a dyadic version of extrapolation for Carleson 
measures, as well as a two-sided local dyadic T(6) theorem which generalizes earlier T(fe) 
theorems of David, Journe, Semmes, and Christ. 
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1. Introduction 

The purpose of this article is to demonstrate the close connection between two sets of 
techniques in harmonic analysis: the theory of Carleson measures and related objects, 
and the theory of trees and related objects. 

A Carleson measure is a positive measure /x on the upper half space such that /i(J x 
(0,^(/)) < |/| for every cube / C R"- with side length There is also an analogous 

notion for domains more general than the half-space, as well as a discrete version: if is a 
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mapping from dyadic cubes into the non-negative reals, then fi satisfies a (discrete) Car- 
leson measure condition J2icj f^i ~ l^^l every dyadic cube J, where the sum runs over 
all dyadic sub-cubes of J. Carleson measures are intimately connected with many aspects 
of harmonic analysis, including non-tangential behavior of functions in the half-space (or 



in a domain) (see e.g. 0]), theory and BMO p8|, boundedness of singular integrals. 



square functions and maximal functions (e.g. [O, |22|, [B3], pOl, [M, pq]), geometric 



measure theory (e.g. 0, [|5|, |6[), and PDE (e.g. [|, [0], H, [||). Moreover, via their 



connection with the theory of trees, Carleson measures have played a significant role in 
recent work on Bilinear Singular Integrals p9| , ^ ^ and (rather appropriately!) 



Carleson's theorem on a.e. convergence of Fourier Series |]25[, In these latter con- 
nections it is more convenient to work in the phase plane than in the Carleson half-space, 
and we have deliberately chosen our notation to refiect this fact. 

This article is mainly expository. Apart from one main new result (a local T(6) theo- 
rem), we shall mostly take existing results (atomic decompositions, paraproduct estimates, 
Carleson embedding) and re-prove them in a framework which unifies both the Carleson 
measure theory and the theory of trees and tiles. (As such there is some overlap with the 
recent lecture notes in [P^]). 

Since this is an expository article, we shall simplify matters and only work in one 
dimension R. Also, we shall mostly work in the dyadic setting instead of the continuous 
one, to avoid issues such as rapidly decreasing tails or use of the Vitali covering lemma. 
Thus, our results will be phrased using dyadic intervals and the Haar basis instead of 
arbitrary intervals and Gaussians (or similar smooth kernels) . However most of our results 
have continuous analogues (see e.g. for a comparison between dyadic and continuous 
harmonic analysis). We also will truncate all our spaces to be finite-dimensional to avoid 
technicalities. 

The paper is organized as follows. After setting up the notation of dyadic Carleson 
measures and BMO, Haar wavelets, and tiles and trees, we will give a quick review of the 
standard "L°°" theory of BMO (i.e. measuring the ways in which BMO is close to L°°), 
but from the perspective of trees and tiles. As part of this L°° theory, we give a trees- 
based proof of the (dyadic analogue of the) extrapolation lemma for Carleson measures 
developed recently in 0, [Q. We also give an alternate proof of the extrapolation 
lemma due to John Garnett. 

We then show how BMO is also useful in "L^" contexts, mainly through a BMO version 
of the Calderon-Zygmund decomposition. This type of lemma is used often in the recent 
work on Carleson's theorem and the bilinear Hilbert transform, and is implicit in earlier 
work on Carleson measures and similar objects; we illustrate this by using the BMO 
Chebyshev inequality to re-prove the standard atomic decomposition of H^. 

Next, we prove the Carleson embedding theorem and give its usual applications to para- 
product estimates and the T(l) theorem. We also give a short proof of the boundedness 
of paraproducts below L^; the proof is more direct than earlier proofs in that one does 
not go explicitly through the T(l) theorem. 

Finally, we consider Calderon-Zygmund operators. We prove a two-sided local T(6) 
theorem which generalizes the existing local and global T(6) theorems (|2^, 0, ||TT| , 
||50|); for instance, we can prove the standard global T(6) theorem assuming that b is only 
in BMO rather than L°°. 
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The T(6) Theorem, in its various guises, has its roots in a question posed by Yves 
Meyer, who asked whether the T(l) Theorem of David and Journe (see also Chapter 
6 below) remains true if the constant function 1 is replaced by some function b G L°° 
with Reb > 6 (such b are said to be "accretive"). The question was motivated by its 
applicability to the boundedness of the Cauchy integral operator on a Lipschitz graph. 
Indeed, if F denotes the graph, in the plane, of a real-valued Lipschitz function A, then 
by Cauchy's theorem, we have that in the sense of BMO (that is, modulo constants), 

1 



= p.v. 



-dw, 



w 



for z G 
BMO), 



r. But in graph co-ordinates, this amounts to saying that (again in the sense of 

l+tA'{y) 



= p.v. 



-dy = T{b){x) 



X - y + i{A{x) - A{y)) 

where b is the accretive function 1+iA' ., and T is the singular integral operator naturally as- 
sociated to the antisymmetric Calderon-Zygmund kernel K[x, y) = {x — y + i{A{x) — A{y))y 
The boundedness of T, and hence also that of the Cauchy integral operator 

/H 



Crfix) = p.v. 



-dw. 



z — w 



thus follows from an analogue of the T(l) Theorem in which the condition T(l), T*(l) G 
BMO is replaced by the condition T(6) = = T*(6), for some accretive function b. 
Just such a result was proved by Mcintosh and Meyer ^3[, who consequently obtained 

concerning the 
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an alternative proof of their earlier joint result with Coifman 
boundedness of the operator Cp. 

The "T(6) Theorem" of was generalized by David, Journe and Semmes to allow 
T(6),T*(6) G BMO (indeed, they allowed other generalizations as well, for example that 
there could be two different accretive functions 61,62 such that T(6i),T*(62) G BMO, 
and moreover that the pointwise accretivity condition could be relaxed to a condition 
holding on various sorts of averages - see, e.g., the notion of "pseudo-accretivity" defined 
in Section 6.1 below). 

This led to a proof of the T(6) Theorem by constructing Haar wavelets adapted to the 
function b |^ (we shall base our proof on a variation of these adapted Haar wavelets). 

A very simple proof of a "one-sided version" of the T(6) Theorem was obtained by 
Semmes [Q, who observed that in the special case T(6) G BMO, T*(l) = 0, one can 
readily show that T(l) G BMO, thus reducing matters to the T(l) Theorem. It is worth 
noting that a suitable adaptation of Semmes's argument is applicable to the solution of 
the square root problem of Kato. Indeed, one of the present authors (Auscher), along 
with Tchamitchian P], formulated a version of the T(6) Theorem whose proof was based 

and which was subsequently used to solve the Kato problem 
11. We further note that there are local versions of the 
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upon the argument of [§0 
in higher dimensions | 
T(6) Theorem, due to M. Christ [|l^ (cf. Theorem |6.^ below), which also have interesting 
applications, namely to questions of analytic capacity; see for instance for further 
discussion. 

This work was conducted at U. Missouri, UCLA, and the Centre for Mathematics and 
its Applications (CMA) at ANU. The authors are particularly grateful to CMA for their 
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Fellow and is supported by a grant from the Packard Foundation. CT is supported by 
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We use A ^ B to denote the estimate A < CB for some absolute constant C which 
may vary from line to line. 

If is a set, we use \E\ to denote the Lebesgue measure of E. We will always be 
ignoring sets of measure zero, thus we only consider two sets F to be intersecting if 



Although our functions may be complex valued, we shall use the real inner product 



throughout. 

2.1. Tiles and trees. We shall be working with dyadic intervals throughout the paper. 
The number of dyadic intervals is infinite, but to simplify the arguments we shall restrict 
ourselves to a finite set on the half-line; in applications, this restriction can always be 
removed by a standard translation and limiting argument. Specifically, we fix a large 
integer M > 0; none of our estimates will depend on M. We define dyadic interval to an 
interval^ of the form I = [j2^, {j + 1)2^^], where j, k are integers such that —M < k < M 
and / C [0,2^]. Let I denote the set of all dyadic intervals; observe that I is finite. All 
sums and unions involving J or J will be assumed to be over I unless otherwise specified. 
If / is a function on R, we define [/]/ := ^ / to denote the mean of / on I. We use 

21 to denote the parentP] of /, and Ii, 1^ to denote the left and right children of / (these 
are undefined if |/| = 2 or |/| = 2^^^ respectively). We refer to the intervals Ii and 1^ 
as siblings. 

Since our dyadic intervals have been restricted to a finite set, all norms will automat- 
ically be finite and all stopping time processes will automatically terminate. This allows 
us to avoid some minor technicalities in our arguments, although it also means that we 
occasionally have to treat the smallest scale |/| = 2^^^'^ or the largest scale \I\ = 2*^ a 
little differently from all the other scales. 

A major advantage of the dyadic setting is the nesting property: if /, J are dyadic 
intervals which intersect each other, then either J C J or J C /. In particular, for 
any collection of dyadic intervals, the maximal intervals in this collection will always be 
disjoint. 

^We will be careless about whether our intervals are closed, half-open, or open because of our convention 
of ignoring sets of measure zero. 

^On the non-dyadic theory 21 is often used to denote the interval with the same center as / but twice 
the length; this can be thought of as a non-dyadic version of the parent of /. However, in this paper we 
use 21 to exclusively refer to the dyadic parent of /, i.e. the unique dyadic interval of twice the length 
which contains /. 



2. Notation 



EnF\>0. 
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The theory of Carleson measures is usually set in the upper half-space 

nl := {{x,t) : X e R,t G R+}. 

Actually, because of our truncation parameter M we will work in the compact subset 

{UDm := {{x,t) : X G [0,2*^],t G [2-'' ,2''~']}. 

The variable x represents spatial position, while the t variable represents time, wavelength, 
or spatial scale. For every dyadic interval / G I, we let = \I\ denote the side-length 
of J, and define the Carleson box Q{I) C (R^)Af by 

and the Whitney box Q^{I) C Q{I) by 

g+(/):=/x[i^, /(/)]. 

We remark that we have the partition Q{I) = [Jj-jciQ^i'J)- 

Meanwhile, the theory of trees and tiles is usually set in phase space 

r2 := {(x,^) : X G R,^ G R}. 

Because of our truncation, and because we are in the dyadic setting, we will instead work 
in the region^ 

(RV:={(^,O:^e[0,2^],eG[0,2^]}. 

The variable x represents spatial position, while ^ represents frequency. A Heisenberg tile 
or simply tile is a rectangle in R^ of the form P := Ip x top, where Ip and tup are dyadic 
intervals such that \P\ = |/p||ci;p| = 1. 

If P and Q are tiles, we say that P <Q ii P intersects Q and Ip ^ Iq. This is a partial 
order on tiles. 

If J is a dyadic interval, we define the lacunary tile P^{I) by 

and the non-lacunary tile P^{I) by 

n/)-/x 10,^1. 

Let P"*" denote the set of all lacunary tiles, and P° the set of all non-lacunary tiles. We 
define P+(/) <' P+(J) if and only if P°(/) < P\J), or equivalently if / C J. Thus <' is 
a partial ordering on P"*". Of course, there are many tiles which are not in either of these 
two sets, and many results in this paper can be extended to general tiles. However, for 
simplicity we shall mostly restrict ourselves to the lacunary and non-lacunary tiles. We 
write [f]p as shorthand for the averages [f]ip- 

If P+(J) is a lacunary tile, we define the parent 2P+(/) of P+(/) by 2P+(/) := P+(2/). 
Similarly define 2P°(/) := p0(2/). 



■^In truth, we are working not with the Euchdean field R, but with the Walsh field = (Ti-i)^ . See 
e.g. p. 
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Figure 1 . The geometry of the Carleson half-plane (partitioned into Whit- 
ney boxes) and phase space (partitioned into non-lacunary tiles). The 
heuristic t = 1/^ provides a one-to-one correspondence between the two 
partitions. 



A lacunar]^ tree (henceforth abbreviated as tree) is a collection T C P+ of lacunary 
tiles with a top tile Pt € T, such that P <' Pt for all P & T. We use It as short-hand 
for Jpj,. If P G P"*", we define the complete tree Tree(P) to be the tree 

Tree(P) := {Q e P+ : Q <' P} 

with top P. We sometimes write Tree(/) for Tree(P^(/)). Note that every tree T lies 
inside a complete tree Tree(PT). If T is a tree inside a collection P of tiles, we say that 
T is complete with respect to P if T = Tree(PT) H P. 

Let a > and T be a tree. We define an a-packing of T to be a set P C T of tiles such 
that 

|/p| < al/rl- 

PeP 

We say that P is a uniform a-packin(^ of T if 

PeP-.ipcJ 

for all dyadic intervals J. 



''^Non-lacunary trees TCP" are also useful in the study of the bilinear Hilbert transform and Carleson's 
operator; more precisely, when treating the bilinear Hilbert transform (B{f,g),h) one uses a triple of 
trees associated to /, g, h respectively, with two of the trees lacunary and the third non-lacunary (but 
possibly with a non-zero frequency origin). Similarly when treating the Carleson operator {Cjq(^x)f tXe) 
one uses a pair of trees associated to / and xe respectively, with one lacunary and one non-lacunary. 
See [|o|, However we will not use non-lacunary trees explicitly in this paper, although they appear 
implicitly in Lemma 4.2 and in the paraproduct theory. 



^This is roughly equivalent to gp having a BMO norm bounded by a. 
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Phase space 
Lacunary tile P^{I) 
Non-lacunary tile P^{I) 
Complete tree Tree(/) 
Convex tree 
Size 1 /i| size(r) 
Bounded maximal size 


Upper half-plane 
Whitney box g+(/) 
"Tower" I x [/(/), oo) 
Carleson box Q{I) 

Carleson box above a Lipschitz graph 
Normalized mass fi{Q{I))/\I\ 
Carleson measure (or BMO function) 
1/t 



Figure 2. A partial dictionary between tree terminology, and Carleson 
measure terminology. In our paper the two viewpoints are essentially equiv- 
alent, however the phase space viewpoint is better adapted to handle more 
general situations where one needs to modulate in frequency. Conversely, 
Carleson measures are better adapted to complex analysis apphcations. 

If a < 1/2 and P is an a-packing of T, observe that the parent tiles 2P := {2P : P G P} 
form a 2a;-packing of T. Similarly if P is a uniform a-packing of T, then 2P is a uniform 
2a-packing of T. 

We say that a collection of lacunary tiles P is convex if for every pair of tiles Pi <' P2 
in P, the set {P G P^ '■ Pi ^' P <' P2} is also contained in P. We will usually be dealing 
with convex trees in this paper. 

The correspondence between the upper half-space and phase space is given by the 
heuristic formula^ 

t = (1) 

in other words, frequency is the reciprocal of wavelength. This correspondence identifies 
Whitney boxes Q^{I) with lacunary tiles P^{I), and identifies a Carleson box Q{I) 
with the complete tree Tree(/). (Incomplete trees T are identified with the portion of a 
Carleson box above a "dyadic Lipschitz graph", cf. 0). Note how this correspondence 
clearly gives a privileged position to the frequency origin ^ = 0. 

The thesis of this paper is that the theory of Carleson measures can be equated with 
the theory of lacunary tiles. The theory of general tiles - which is needed for applications 
such as Carleson's theorem and the bilinear Hilbert transform, in which the frequency 
origin plays no distinguished role - can then be thought of as a generalization of Carleson 
measure theory[]. 

^To be completely precise, one would have to adjust this formula when ^ < 2^~^ , but as this is only 
a heuristic anyway we will not bother to do this. 

^In our paper, we will only need tiles which are centered at or near the frequency origin, in which 
case it does not particularly matter whether we use the Carleson half-plane or the phase plane. However, 
we have chosen to use phase space notation (using frequency ^ instead of wavelength t) as this is more 
compatible with the more general theory of multilinear operators such as the bilinear Hilbert transform (or 
the Carleson maximal operator), which are invariant under translations of the frequency variable. Note 
that the modulation operation / i— > 5^^*^°^/ can be represented easily in the phase plane as a translation 
by ^0 in the ^ variable, but is not so elegantly representable in the Carleson half-plane. Nevertheless, 
we will not need to modulate in frequency in this paper, so the Carleson viewpoint and the phase space 
viewpoint are essentially equivalent here. 
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2.2. Size and Carleson measures. Let T be a convex lacunary tree, and suppose that 
we have a function a : T ^ R"*" assigning non-negative numbers to each tile in T. We 
define the size of a on T by 



p||size(T) 



PeT 



Now let P be any collection of lacunary tiles, and let a : P i— >• R"*". We define the maximal 
size of a on T by 

||a||size*(P) := sup ||a||size(r) (3) 

TCP 

where T ranges over all convex lacunary trees in P; we adopt the convention that 
1 1 0-1 1 size* (P) = if P is empty. The notion of size and maximal size is analogous to a- 
packings and uniform a-packings. For instance, the following lemma is immediate from 
the definitions: 

Lemma 2.1. If P is a uniform a-packing of a tree T, and a{P) obeys the weak Carleson 
condition 

\a{P)\ < A\Ip\ for allP eP 
for some A > 0, then ||a||size*(P) < ^ct- 

If /i is a non-negative measure on the truncated upper half-space (i?^)M, then it assigns 
a non-negative number to each Whitney box. By the correspondence (||), we can thus 
assign to each lacunary tile P = P~^{I) a number /i(P) by the formula /i(P"'"(J)) : = 
li{Q^{I)). We say that /x is a Carleson measure if 

||/^||size*(P+) < OO. 

The reader may easily verify that this is equivalent to the usual formulation of a Carleson 
measure, namely that fi{Q{I)) < C\I\ for some constant C. 

2.3. Wavelets, phase space projections, and BMO. Let P be a lacunary tile. We 
define the (mother) Haar wavelet (pp to be the L^-normalized function 

(j)p := |/pr^/^(x4 - Xrp) 

where Ip and J^^^'^ are the left and right halves of Ip respectively. Similarly, if P is a 
non- lacunary tile, we define the (father) Haar wavelet 0p by 

0P := |/p|"'/'x/p- 

Observe that these functions are normalized in L^, and that (pp and 0p' are orthogonal 
whenever P and P' are disjoint^. 

It is in fact possible to assign a function (pp to every Heisenberg tile; these functions are 



known as Walsh wave packets, see e.g. [52| for a discussion. These Walsh packets can then 
be used to efficiently decompose such operators as the (Walsh) bilinear Hilbert transform 
or the (Walsh) Carleson operator, just as the Haar wavelets can be used to decompose 
(dyadic) paraproducts or (dyadic) Calderon-Zygmund operators; see in particular the 

^For a pair of lacunary tiles, this means that Ip ^ Ip/; for a pair of non-lacunary tiles, this means 
that Ip and Ipi are disjoint. For lacunary P and non-lacunary P' , this means that Ip' is not a proper 
subset of Ip . 
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remarks after (^81) . However, we shall not make any use of the Walsh wave packets for 
the results in this paper. 

We define a ( dyadic ) test function to be any finite linear combination of mother and 
father Haar wavelets (pp. We use S to denote the space of all test functions, and 5*0 to 
denote the test functions with mean zero. For any dyadic interval /, we define «S'(/) to be 
the elements of S which are supported in J, and similarly define Sq{I). Note that S{I) is 
only one dimension larger than So{I), and is in fact spanned by 5*0(1) and any function 
bi G S{I) with non-zero mean. This fact will be used much later on when we discuss local 
T{b) theorems. 

Since the mother Haar wavelets are orthonormal, we have the representation formula^ 

/ = E (/''^W^P 
PeP+ 

for all / G 5*0. 

If / G S* and T is any collection of disjoint tiles in P+ U P° (i.e. the tiles in T can be 
lacunary or non-lacunary) , we define the phase space projection H^ by 

Per 

This is an orthogonal projection from L^(R) to the space of functions spanned by {0p : 
P G T}. For instance, we have Hpo(/)/ = [f]iXi and 

nT.ee(/)/=(/-[/]/)X/. (4) 

More generally, for any convex tree T C and any / G 5 we have 

T^rfix) = [fU,,T) - [fU.,T) (5) 

for some intervals I{x, T), J{x, T) containing x; the exact choice of these intervals depends 
on T. The formula (|^) can be derived by writing T as a complete tree Tree(Pr) with some 
smaller complete trees removed, and then using (^). 

If / G S", we define the wavelet transform Wf of / to be the function 

WfiP):= {f,^p) 

defined on P+; this is an isometryf^ between 5*0 (endowed with the norm) and /^(P+). 
In particular, the function maps P"*" to R"*", and so one can compute the size of 

on various collections of trees. We observe in particular that 

iiiw^/nisize(T) = j^mrfwi <T^ [ I/ - [/]/.r < ^ / i/p (6) 

for all trees T. 

If / G S", we define the dyadic BMO norm of / by 

II/I|bmo:=|||W^/P||;£.(p.) (7) 

^The continuous version of this would be a Calderon reproducing formula such as f{x) — 
/2t2Ae*'^/(x)f . 

-'^"The continuous analogue of this in the upper half-plane with measure would be the function 
Qtf{x), where Qt is a suitable cancellative averaging operator with wavelength t, e.g. Qt := t^Ae*^^. 
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The reader may easily verify that this definition corresponds to the usual (L^-based) 
definition of dyadic BMO. Note that the projections 11^ defined earlier are bounded on 
LF' and BMO (in fact they are contractions). 

Thus the concepts of Carleson measure, BMO, and maximal size are essentially the 
same concept. However, the concept of maximal size extends more easily to general 
families of tiles (not necessarily lacunary) than the other two notions. (In particular, the 
notion of maximal size on a tree centered at an arbitrary frequency .^o is central to the 
boundedness of the bilinear Hilbert transform, see p9|). 



2.4. Mean. We shall need a notion of mean (or "normalized mass"), which can be 
thought of as a "non-lacunary" variant of size. Given any function / on R and a tile 
P G P"*", we define 

||/||mean{P) := [|/|]/p = [ 

\^P\ Jlp 

and for any collection P C P^ of lacunary tiles we define 



I/I 



mcan*(P) •— SUp || J || mean(P) • 

PeP 



Like the notion of BMO, the notion of mean has the scaling of L°°. One can extend the 
notion of mean to arbitrary tiles; the function / should then be replaced by a measure on 
phase space. For instance, in applications to Carleson's theorem |jlO| the notion of mean 



is applied to a measure of the form Xe{x)S{^ — N{x)), where N is an arbitrary function. 
See H, m. 



3. The theory" 

It is well known that the notion of maximal size or BMO can be thought of as a stable 
substitute for the L°° norm, which is often ill-suited for applications. In this section we 
develop the standard theory for this norm. 

We begin with a simple but very useful principle: to bound the maximal size of a 
collection of tiles, it suffices to do so outside of an (1 — ?7)-packing of each tree in the 
collection. 

Lemma 3.1. Suppose P is a collection of lacunary tiles, and let a : P ^ R^, A > 0, 

and < 1] < 1 be such that for every tree T which is complete with respect to P, one has 

||a||size(T\UT'eT7^') - ^ 

for some collection T of trees in T whose tops {Pt' '■ T' G T} are a (1 — T])-packing ofT. 
Then we have 

||a||sizc*(P) < A/r]. 
Proof Let T be a tree in P. From hypothesis we have 

^a(P)= Yl a(P)+EE«(^) 

PeT PeAUy/gT^' T'eTPeT' 

< A\It\ + ^ ||a||sizc*(P)|-^T'| 

T'gT 

< |/T|(A+(l-r7)||a||size*(P)). 
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P 




Figure 3. A convex tree T with top Pt = P- Note how this tree can be 
thought of as a complete tree Tree(P) with some smaller complete trees 
removed. In this particular tree, the tops of the trees removed form a |- 
packing of Tree(P), so that | of the tree T "makes it all the way to the 
top". On the Carleson half-plane, this region resembles the portion of a 
Carleson box above a Lipschitz graph. 

Dividing by \It\ and taking suprema of both sides we obtain 

||a||size*(P) < ^ + (1 - ?7)||a||size*(P) 

and the claim follows. 



An alternate (and perhaps more intuitive) proof of Lemma is to start with a tree T, 
estimate the "good" part T\ IJt'gt tiee, and then pass to the "bad" trees T' e T 

and iterate this process until the tree is completely exhausted. Since the geometric series 
— 7])"^ converges to l/i], the claim follows. 

Corollary 3.2 (Good-lambda characterization of maximal size). Let V he a collection of 
lacunary tiles, and let a : P ^ R"*" be such that 

\{x elr-.Yl ^(P)^Tr- ^ ^ (1 - 

for some yl>0, 0<?7<1 and all trees TCP. Then we have ||a||size*(P) < 
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This lemma goes back at least to Fritz John A partial converse can be obtained 
from Markov's inequality: 

\{x e It : E«(^)^Trr - "^^l - 5Z«(^)fllli ^ ^I^Tlllallsize(T) < ^|/t| ||a||3i.e*(P). 

Proof Let T be any tree in P. Consider the set Q of all tiles Q E T such that 

^ a{P)/\Ip\>A 

PeT:Q<'P 

and such that Q is maximal with respect to the ordering <'. By assumption Q is a 
(1 — ?7)-packing of T. 

By Lemma ^TT] it suffices to show that ||a||sizo(T\UQgqTrco(Q)) < A, or equivalently that 



aiP)^Tl^ dx<A\lT\. 



^^PenUgeqTreeCQ) 

But by construction of Q, the integrand is bounded by A for all x G It, and the claim 
follows. ■ 

A similar argument gives the well-known characterization of BMO: 

Corollary 3.3. Let < p < oo, and let f E S be such that jTj // 1/ ~ [f]i\^ ^ 1; or 
equivalently that |nTree{7)/|^ ~ l-^l fof^ dyadic intervals I . Then 

\\f\\BMO<l. 

(The implicit constants depend on p). 
Applying this corollary with p = 1 we obtain in particular that 

in fact, this is sometimes taken as the definition of (dyadic) BMO. 

Proof We need to show that has bounded maximal size. Let T = Tree(J) be any 



complete tree for some interval I. By Lemma it suffices to find a collection T of trees 
in T whose tops are a ^-packing of T such that 

ll|W^/nisize(T\UT'eT^') ~ ^ 

First observe from @ that if J C / and x E J then 

nTree(/)/(a;) — [nTree(7)/]j = nTree(J)/(a;) 

SO from hypothesis we have 

y^|nTree(7)/-[nTree(7)/]jr<|J|. (9) 

Now let Cp be a large constant to be chosen later. Let Q denote the tiles Q E Tree(/) such 
that I [nTree(7')/]Q| > C*p ^ud that Q is maximal with respect to <'. If Cp is sufficiently 
large, we see from (0) 



|nTree(/)/r>C,^|/Q| 
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and hence that Q is a ^-packing of T. In particular the collection 2Q = {2Q : Q G Q} of 
parents of tiles in Q is a ^-packing of T. 

We set T := {Tree(2Q) : 2Q e 2Q}, and define 

F := nT\Uyg^T/ = nTree(/)/ " ^ nTree(2Q)/- 

Then we can rewrite the left-hand side of (H) as p|||-F||2- 

By construction we see that F is supported on /. Since [nTree(2Q)/]2Q = 0, we see that 
F is constant on each I2Q and that 

||-^||l°°(2Q) = |[-^]2q| = |[nTree{/)/]2Q| < Cp. 

If X G / is not in any of the I2Q, then 

|F(x)| = |nT,ee{/)/(x)|<C, 

Thus we have ||-F||oo ^ Cp. Combining this with the previous we obtain as desired. ■ 

We now give the well-known converse to the above Lemma: 

Lemma 3.4 (John-Nirenberg inequahty). Let I be a dyadic interval, and let f G >S'o(/) 
be real-valued. Then we have 

\\f\\p<{l+pWlf\\BMO 

for all < p < 00 and 

\{xel: fix) > 2n\\f\\nMo}\ < 2-"+Vl for all n G Z+. 

Proof It suffices to prove the latter inequality, as the former easily follows. 

We prove the claim by induction on n. The claim is clear for n = 1. Now suppose that 
n > 1 and the claim has already been proven for n — 1. 

Fix J, /. Let P denote those tiles P in Tree(/) such that [f]p > 2\\f\\BMO, and such 
that P is maximal with respect to <'. For each P we have 



On the other hand, from (^, (|^) we have 



/ \f\'>\Ip\\{f]p\'>Mlp\\\frBMO- 

J Ip 



/ |/r< |/||||W^/ri|.ize(TYee(/)) < |/|||/|| 



Thus P is a ^-packing of Tree(/), so that the collection 2P = {2P : P G P} of parents of 
tiles in P form a —packing of Tree(/). 

By construction we have [f]2P < 2||/||bmo for all P G P, and /(x) < 2||/||ba/o for all 
X UpeP -^2P- Thus 

{x G / : f{x) > 2n\\f\\BMO 

} C y {x G /2P : / - [f]2p{x) > 2{n - 1)||/||bmo}. 

2PG2P 

The claim then follows from the inductive hypothesis. ■ 
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3.1. Chopping big trees into little trees, and extrapolation of Carleson mea- 
sures. Let a : P+ R"*" be a function. Suppose we have a convex lacunary tree Tq with 
a large size, let's say 

Ik 1 1 size* (To) < Cq. (10) 

Let < (5 < Co be a small number. An obvious question to ask is whether one can 
decompose the large tree Tq into small trees, each of which has size less than or equal 
to 5. This is clearly impossible, as the example of a singleton tree Tq with large size 
demonstrates. However, one can do the next best thing: 

Theorem 3.5. With the above assumptions, we have the disjoint partition 

ro=|Jrup (11) 

TgT 

where the trees T in T are convex and satisfy 

||a||size*(T) < S (12) 

while the tiles P G P obey the estimate 

a{P)<Co\Ip\. (13) 

Furthermore, the tiles P and the tree tops {Pt '■ T G T} are both uniform C(Co,5)- 
packings o/Tq. 



Note that Lemma ^]T| gives an easy converse to the above Theorem: if Tq can be 
partitioned by (PD with the above properties then ||a||size*(To) is bounded (but by a much 
large constant than Co). Thus, if one is willing to ignore losses in constants, the above 
Theorem gives a complete characterization of trees of large size in terms of trees of small 
size. As we shall see in this section, this theorem can be applied to give extrapolation 
lemma for Carleson measures, and seems likely to be useful in other contexts also. 

A continuous parameter version of Theorem |3.5| is at least implicit in P], where, as 
here, it is used to prove the "Extrapolation Lemma for Carleson Measures" (see Corollary 
|3.9| below). The latter, in its continuous parameter form, was then used to establish 
the "restricted version" of the Kato square root conjecture, for L°° perturbations of real, 
symmetric, elliptic coefficient matrices. The essential idea of the extrapolation method 
had previously been introduced by J. Lewis in his work with M. Murray on the heat 
equation in non-cylindrical domains, and refined further by Lewis and one of the present 
authors |^ in their work on parabolic and elliptic equations. Similar ideas had also 
appeared previously in the work of David and Semmes on uniform rectifiability: indeed 
Theorem 4.5 is very closely related to the "Corona Decomposition" of [p^ . 

Roughly speaking, in applications of the extrapolation method, the idea is first to show 
that some "scale-invariant estimate on cubes" (like a Carleson measure estimate, a BMO 
estimate, or a reverse Holder or A^q estimate for a weight) holds when some controlling 
Carleson measure is suitably small in a certain sense, which will be made precise in the 
sequel. The term "extrapolation" refers to the removal of the smallness restriction. In 
that sense it is analogous to G. David's technique for bootstrapping the Lipschitz constant 
(see, e.g., I^T]), although it is not clear whether there exists an explicit connection between 
the two methods. 
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In 1^^, [0, for example, the controlling Carleson measure was a condition on either 
the boundary of the domain, or on the coefficients of the elliptic or parabolic operator, 
and one proved reverse Holder iequalities for the associated elliptic-harmonic or parabolic 
measures. In particular, in ||3^, the authors give an alternative proof, via extrapolation. 



of the main theorem of R. Fefferman, Kenig and Pipher [^, in which the controlling 



Carleson measure is a condition on the disagreement between the coefficients of two elliptic 
(or parabolic) operators, in the case that reverse Holder estimates are known to hold for 
the elliptic-harmonic measures associated to the first operator, and one wishes to prove 
such estimates for the second. 

In [^, the authors exploit the fact that proving Kato's square root estimate is equivalent 
(by "T(l)" type reasoning) to proving that a certain positive measure in the upper half 
space is Carleson. Here, the controlling Carleson measure was the one associated to the 
original, self-adjoint operator, and the extrapolation technique was used to prove that the 
analogous measure, related to the square root estimate for the perturbed operator, was 



also Carleson. It is in this setting that Corollary |3.9| , or rather its continuous parameter 
analogue, is directly applicable. 

The proof we give here follows the approach in At the end of this section we give 
an alternate proof, due to John Garnett, which gives better dependence on constants. 

Before we give the rigorous proof, we first informally describe the idea of the argument. 
Suppose the original tree Tq has size ||a||size(ro) = c. Then < c < Co by (|10|). To create 
a tree of maximal size less than 6, we start with Tq and remove from it some sub-trees 
of size between c + 6/2 and c — 6/2, which we select by a straightforward stopping time 
argument. It then remains to control the sub-trees that were removed. By shrinking the 
trees slightly (putting the error into P) we can assume that the trees have size either 
greater than c + 6/2 or less than c — 6/2 (so that the tree that remains must have size at 
most 6). We call the first type of tree "heavy" and the second type "light". Because the 
original tree had size c, it cannot be the case that Itq is covered by heavy sub-trees, and so 
a positive proportion of Itq must be covered by light trees or by nothing. We then pass to 
the light sub-trees and iterate this process, finding a positive proportion Itq occupied by 
increasingly lighter sub-trees. After about 0{Cq/6) steps, we must terminate, finding a 
positive proportion of Itq which are not covered by any further sub-trees. We then pass to 
the remaining portion of Itq and all the heavy trees which have until now been neglected, 
and iterate once again; since we have replaced Itq with a strictly smaller fraction of Itq, 
this procedure will converge geometrically to obtain the desired estimates. 



We now prove Theorem |3]^. We shall drop (|T3D since it follows from ([lOD . In the spirit 



of Lemma |3.1| , it will suffice to prove the apparently weaker 



Theorem 3.6. With the above assumptions, we can find a (possibly empty) collection 
T iterate of disjoint convcx trees in Tq whose tops have disjoint spatial intervals and form a 
(1 — ri)-packing o/Tq for some rj = tjICq, 6) > 0, such that we have the disjoint partition 



[j TU y TUP (14) 



where the trees T G T obey (|T2D, and P and the tree tops ofT are both uniform C{Cq, 6)- 
packings o/Tq. 
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Indeed, if Theorem |3.6| holds, then we can construct the collection in Theorem |3.5| by 
starting with the partition (p^ , and then taking each of the trees in Tuerate and breaking 



them up by a further application of Theorem 3.6. We continue on in this way until the 



original tree Tq is completely broken up into trees T obeying (O) and tiles P obeying 



(p!3|). The fact that P and the tree tops of T are C(Co, 5)-packings of Tq then follows 



from Theorem |3]^ and the fact that the geometric series Y2ni^ ~ v)^ converges. A similar 
argument can then be used to improve "C(Co, (5)-packing" to "uniform C(Co, 5) -packing" . 
We omit the details. 

Proof (of Theorem p.6| ). Define the quantity c by 

c := ||a||size{To), (15) 

thus < c < Co- We shall prove the theorem by induction on c. Specifically, we fix 
< c < Co and assume that the theorem has already been proven in the case ||a||size(To) ^ 
c — 6/2. Note that we only have to apply this induction a finite number of times (about 
0{Co/S)) so we will be allowed to let the constants get worse with each induction step. 
The main lemma used in the proof of the Theorem will be 

Lemma 3.7. We can partition 

U Pbuffer U [J T U [J T (16) 

'-^^'^ small '-^^'^ heavy light 

where Tsmaii is a collection of convex trees which all obey (|12D and whose tree tops are 
a uniform A-packing of Tq, 'Pbuffer is uniform 3-packing of Tq, and Th^avy, "^Ught o,re 
collections of disjoint convex sub-trees of Tq which are complete with respect to Tq, and 
are such that we have the tree counting estimates 

t\+ \^t\<\Ito\ (17) 



'^^'^ heavy light 



^ heav 



y 



and the size bounds 

1 1 1 1 size (T) 

< c - 5/2 for all T e Tug^t- (19) 

Proof Define T fluctuate to be those sub-trees T of To which are complete with respect 
to To, such that | ||a||size(T) — c| > 6/2, and such that T is maximal with respect to set 
inclusion and the above two properties. Note that such trees are automatically convex. 

By construction, none of the trees in T fluctuate contain the top tile Ptq- We may 
subdivide[3 

fluctuate heavy U Tught 

where T heavy consists of those trees T e T fluctuate with 

||«||size{T) >C + 5/2 (20) 

and Tiight consists of those trees T G T fluctuate with 

||a||sizc(T) < c - 6/2. 



With reference to Figure 3.1, T fluctuate consists of the h and I trees, Ti consists of the s and h tiles, 



and T2 consists of just the s tiles. 
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Figure 4. A convex tree Tq and its decomposition from Lemma |3.7|. The 



circled h and / tiles are the tops of maximal sub-trees of Tq for which the 
size of a fluctuates by at least 6/2 from c; the uncircled h and / tiles are 
the remaining tiles in those maximal sub-trees. Theavy thus consists of the 
(three) h trees while Tug^t consists of the (two) / trees. P buffer consists of 
those remaining tiles (labeled b) which lie just below a heavy or light tile, 
or are at the very top of the phase plane. The remaining tiles (labeled s) 
form the (three) small trees T small- 



The trees in T fluctuate are disjoint, convex, and have disjoint spatial supports, so (17) 
holds. On the other hand if one multiplies ( pO]) by and sums over all T G Theavy one 
obtains 

\lT,\c=Y.a{P)> a{P)>{c + 6/2) I^H- 

^ heavy 

Dividing by c -|- 6/2 we obtain (18). 

Let Ti denote the convex tree Ti := To\l J^^t- T with top Pto- Informally, Ti 

^ ^-^-t ^ ^ fluctuate ^ 

represents the portion of Tq below the fluctuating tiles. The tree Ti contains Ptq and is 
hence non-empty. Let 'P buffer denote the tilesQ 

^buffer := {P eTi : P = 2Q for some Q ^ Ti} U {P E Ti : \Ip\ = 2"^^}. 

^^Here we are taking advantage of our decision to work in a finite model, wliere the tiles have a minimal 
width 2~*^. One can replicate this argument in the infinite setting but one has to treat the portion of 
Ti which "goes all the way to infinity" separately. See |Q. 
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In other words, P buffer consists of those tiles in Ti which touch the upper boundary 
of Ti (which in particular may include the tiles of minimal width \Ip\ = 2~*^). Since 
the tiles Q in the definition of P buffer have disjoint spatial supports and |/p| = 2\Iq\ 
we see that {P E Ti : P = 2Q for some Q ^ Ti} is a uniform 2-packing of Ti. Since 
{P G Ti : \Ip\ = 2^*^} is clearly a uniform 1-packing of Ti, we thus see that Pbuffer is a 
uniform 3-packing of Ti. 

Let T2 denote the (possibly empty) tree T2 := Ti\P buffer with top Pto- This tree is not 
necessarily convex, however we shall invoke the following lemma to split it into convex 
trees. 

Lemma 3.8. Let T be a convex tree, and let P d T he a uniform a-packing of T for 
some a > 0. Then T\P can be partitioned into 



T\P = [j T' 



T'GT 

where T is a collection of convex trees T' whose tops {Pt '■ T G T} form a uniform 
[a + l)-packing of T . 

Proof Let Q denote those dyadic intervals Q G It such that Q e T\P and 2Q ^ T\P. 
For any Q G Q, we see that either Q = Pt or 2Q G P. Since P is a uniform a-packing, 
this implies that Q is a uniform [a + l)-packing. 

For each Q G Q, define the convex tree Tq with top Q by 

Tq := {P G T\P : P <' Q, and there does not exist Q' G Q such that P <' Q' <' Q}. 

If we then set T := {Tq : Q G Q} we see that the Lemma follows. ■ 

By Lemma T^E we may write T2 = (Jtet n where the trees in Tsmaii are distinct 
and the tree tops of Tsmaii are a uniform 4-packing of T. 

We now verify that each tree T G T small obeys (p^). It suffices to show that 

E «(^) ^ ^1^1 (21) 

PgTree(/)nT 

for all / C Jtq. 

Fix /. The idea is to write Tree(/) fl T as the difference of trees, each of which has size 
c + 0{S). 

We may assume that P^(/) G T since the claim is trivial otherwise. We observe that 

Tree(J) n T = (Tree(/) n To)\ |J (Tree( J) n Tq) 

,/eJ 

where J consists of those intervals J C / such that P^{J) ^ T, and which are maximal 
with respect to this property. 

The tile P~^{I) is in T and hence in Ti. By construction of Ti, we thus have 

E a(P) = |/|||a||sizc{Trcc{/)nTo) < \mc + 5/2) 

PGTree{7)nTo 

(since otherwise Tree(J) flTo would belong to Theavy, a contradiction). Similarly, for every 
J G J, the tile P~^{J) is contained in Ti (otherwise P^(2J) would be both in T and in 
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P buffer, a contradiction), so 

J2 a{P)>\J\{c-6/2). 

PeTree{J)nTo 

By the construction of J, the intervals J in J partition /, thus 

J2 E a{P)>\I\{c-6/2). 

JeJ P6Tree(J)nTo 



Subtracting this from the previous we obtain (pi]) as desired. ■ 

We apply the above lemma and place Pbuffer into P, Tsmaii into T, and Theavy into 
T iterate- For the remaining trees Tug^t we use the induction hypothesis, which splits each 
of the trees in Tug^t into T derate, T, and P. All the desired conclusions of Theorem are 
easily verified except perhaps for the claim that the tops of Tuerate form a (1 — r7)-packing 
of T, or in other words 

J2 \It\ < {1 - V)\ln\. 

'-^ iter ate 

To prove this inequality, note that the trees in Theavy contribute J^TeT^ l-^'^l 
left-hand side, while from the induction hypothesis the trees in Tug^t contribute at most 
{1 — ■)]) I-^tI for some 77 > 0. There are no other contributions. The claim then 

follows from (17), (18) (reducing the value of r] as necessary). ■ 

The constants C{Cq, 5) given by this argument are about (Cq/5)'"'"°^^ . This bound can 
be improved substantially; see below. 

The following corollary allows one to use one Carleson measure fi to prove the Carleson 
measure property of a related measure fi'. It is the dyadic version of an extrapolation 
lemma in [Q], which in turn is based on ideas in |]32[ . 

Corollary 3.9 (Extrapolation of Carleson measures). Let fi : P+ —>■ have bounded 
maximal size and let S > 0. Let fi' be a non-negative measure on obeying the "weak 
Carleson condition" 

fJ^'iP) < Ci\Ip\ for allP e P+ 
and such that ||/u'||sizc(T) < C2 for all convex trees T such that ||/u||size*(T) < Then fj,' 
also has bounded maximal size: 

||/^'||size*(P+) < C{\\fi\\ 

size* (P+) ) 

Proof Let Tq be any convex tree. We need to show that 

||/^'||sizo(To) <C(|| 1 1 size* (P+)) 5)(Ci + C2). 

By Theorem |375| , we can partition Tq = Utet^ where ||/i||sizo*(T) < ^ for all T G T, 
and 

Ut\ + J2 ^ C'(||/i||sizc*{P+),5)|/To|. 

tgt PeP 
From this and assumptions on fi' we see that 

f^'iP) = E + ^ ^(ll/^llsize*(P+),5)C2|/Tol+C^(||/i||size*(p-.),5)C^l|/Tol 

PeTo reT PeT PeP 

and the claim follows. ■ 
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As mentioned earlier, this lemma has applications to the Kato problem. In P|, this 
lemma was used to establish a restricted version of Kato's conjecture, for perturbations of 
real, symmetric coefficient matrices. In that case, fi was a Carleson measure which con- 
trolled the original operator, and fi' was the analogous measure controlling the perturbed 
operator. The point was to establish that fi' was also a Carleson measure. We remark 
that the fact that the final bound on /x' was linear in C2 was crucial to this application. 

It is possible to eliminate the weak Carleson condition by allowing the tree measured 
by fi' to be a little larger than the tree measured by /i, but we will not pursue this type 
of generalization here. 



3.2. An alternate argument. In this section we give an alternate proof of Theorem 
0|, due to John Garnett (personal communication). The idea of this argument is similar 



to some arguments in g|. 

Fix To, a. We first observe that it suffices to prove the theorem under the additional 
"weak Carleson" assumption 

a(P) < -\Ip\ for all P G Tq. (22) 

To see this, suppose that we are in the general case when (|22| ) need not hold. We set P 
to be the set of tiles where (1221) fails: 



P:={PeTo:a{P)>^-\Ip\}. (23) 



From (|iy) we see that P is a uniform 2Co/5-packing of Tq (cf. Lemma p.l| ). By Lemma 
p.8| we thus see that we can split Tq\P into a collection of disjoint convex subtrees of Tq, 
whose tops form a uniform 2Cq/5 + 1-packing of Tq. On each such subtree (^) holds. 
Thus if we apply Theorem |3.5| to each sub-tree and then combine all the decompositions, 
we obtain the desired decomposition for the original tree To (with the constants C(Co, 5) 
worsened by a factor of IC^jb + 1). 



Henceforth we assume (p2|) . Under this assumption we will not need P any more, and 
will set it equal to the empty set. 

We can assume without loss of generality that Tq is a complete tree, since if Tq is 
incomplete then one can replace Tq by its completion, and extend a by zero; note that 
the intersection of two convex trees is convex, so one does not lose convexity when one 
restricts back to Tq. 

We can now make the technical assumption that the minimal tiles have large coefficient: 

a(P) = -|/p| whenever P G Tq and |/p| = . (24) 

This is because in the general case one can simply increase a(P) for these tiles to equal 
||/p|; observe that this only increases ||a||size*(T) by at most 5/2, so the claim follows 
by redefining Co as necessary. This technical assumption is needed to make sure that a 
certain stopping argument always halts before it reaches the smallest scale. 

We now use the greedy algorithm to select a subtree T of To of size roughly comparable 
to b: 
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Lemma 3.10. Let Tq he a complete tree, and let a : Tq ^ R obey (|22|) and (|2^ . Then 
there exists a convex subtree T CTq of Tq with Pt = Ptq such that 

2 ^ ll'2||size(T) < ||a||sizc*(T) < S 

Proof Consider the top tile Ptq- If a(-PTo) = fl-^Tol then we can set T to be the singleton 
tree {Ptq}, so we can assume that a(Pro) is strictly less than ||/toI- 

Let T be the class of all convex subtrees T' C Tq with top Pt' = Ptq such that 

|a||size*{T') < 5/2. (25) 



By the previous paragraph, this class T is non-empty; also, by (p^, this class cannot 
contain any tiles with the minimal width 2~^^ . 

Let T=K be a tree in T which is maximal with respect to set inclusion. Let P denote 
the set of tiles P in To\T=k such that 2P G T*; these are the tiles which lie just above T^. 
Since Tq is complete and does not contain any tiles of minimal width, we see that the 
spatial intervals {Ip : P G P} partition Itq- In particular the P are a uniform 1-packing 
of To. 

Set T := U P; this is clearly a convex sub-tree of To with top Pt = Pto- From (^5]) , 
(p^), and the uniform 1-packing property of P we see that ||a||size*(T) < 5 (cf. Lemma 
|2.1|) . It thus remains to show the lower bound on size, i.e. 

Y.a{P)>^-\lT,\. (26) 

PeT 

Call a tile Q E T heavy if ||a||size(Tree(Q)nT) > |, or in other words 

E «(^)>^l^l- (27) 

P&T:P<'Q 

Observe that for every tile P G P, there must exist a heavy tile Q E T such that 



P <' Q, since otherwise one could add P to the tree T while retaining the property (^51) , 
contradicting the maximality of T. 

Let Q denote the set of heavy tiles Q in T which are maximal with respect to the 
ordering <'. By the previous paragraph we see that the spatial intervals of Q partition 
To. If one adds up (pTj) for all such tiles one obtains (p6|). The proof of the lemma is now 
complete. ■ 

When one removes T from the complete tree Td we obtain a union of disjoint complete 



tree, which are of course smaller than Tq but still obey ( ^21) and (|2^ . Thus we can iterate 
the above lemma to obtain 

Corollary 3.11. Let Tq be a complete tree, and let a : Tq obey ( ^2]) and (0). 

Then we can partition Tq = Urex where T is a collection of disjoint convex trees T 
such that 

6 ,, ,, ., ,, 

2 ^ ||0'||size(T) < 1 1 0-1 1 size* (T) < <J 

for allT e T. 
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We apply the above Corollary to the tree Tq in the Theorem, obtaining the collection 
T of trees. We now claim that the tops {Pt '■ T G T} of these trees form a uniform 
2Co/5-packing of Tq. Indeed, for any dyadic interval J C Irp^ we see that 

< E «(^) 

P£To:IpCJ 

< 1 1 1 1 size* (To) I <^ I 

<Co|J|. 

This completes the proof of Theorem |3.5| (with P empty). ■ 

Observe that the constants obtained in this manner are significantly superior to the 
previous argument, being polynomial in Cq/6 instead of exponential. 

4. The "L^' theory" 

In the previous section we established some estimates in the case when the maximal size 
was bounded; this can be thought of as the "L°° theory" of maximal size. Now we study 
what happens when our collection of tiles does not have a good maximal size bound. In 
this case we can subdivide the collection into disjoint trees, such that the size of each of 
the trees is under control: 

Lemma 4.1 ("Calderon-Zygmund decomposition for size"). Let n & Z, P„ he a convex 
collection of lacunary tiles, and let a : — >• R"*" be a function such that ||a||size*{p„) < 2". 
Then there exists a disjoint partition 

P„ = U T U P„_i (28) 
TeT„ 

where P^-i is a convex collection of tiles such that 

||a||sizc*(P„-.) < 2"-' (29) 
and T„, is a collection of convex trees T with disjoint spatial intervals It such that 

||«||size(T) ~ ||a||size*(T) ~ 2"" (30) 

for allT e T. 

In the particular case that a = for some f G So, we then have 

||nT/||BMO~2"/2 (31) 

and 

||nT/||p~|/Tr/"2"/2 (32) 
for all < p < oo (with the implicit constant depending on p). 

Proof We set T„ to be the collection of all trees T C P„ such that ||a||size(r) > 2""-'^, 
and are maximal with respect to set inclusion. Clearly these trees are disjoint (otherwise 
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the union of the two trees would also qualify, and contradict maximality) . They are also 
complete with respect to P„ (i.e. T = Tree(/T) H P„) and thus convex. One then sets 

P„_i := P„\ U T. (33) 
tgt„ 

The properties (^), (pQ]), ( pOf ) are easily verified. To prove the last two properties, observe 
from (|30|) that 

and 



inT/iiBMo = iiiw^/ni;£.(T)~2"/2 



iinT/ih = i/Tr/^iiii^/rii;£(^)~2'^/2|/dV2. 

The claim then follows from the John-Nirenberg inequality (Lemma p.4|) and Holder's 
inequality. ■ 

The above Lemma should be compared with the standard Calderon-Zygmund decom- 
position, which if given a function / with ||/||oo < 2", will subdivide f = g + ^jbi 
where \\g\\oo < 2"~^, the 6/ are supported on disjoint intervals / and have mean zero and 
II 6/ lip ~ 2"|J|^/^ for all < p < oo. The trees in T„ are the analogues of the intervals 
J, and can be thought of as the region of phase space where a (or /) "has size ~ 2^ 



Lemma can also be thought of as a sort of BMO version of the Chebyshev's inequality 

|{a::|/(x)|> 2^^/2)1 <2-"^/2||/||^. (34) 

Indeed, if /, n, T„ is as in the above lemma; then by (|32D and the disjointness of the It 
we have 

11/11^ > E W^rfWl- 

TeT„ tgt„ 

and hence 

I U ^tI <2-"^/^||/||^ (35) 

TGT„ 

(compare with (0), (0), (|2|)). 

In practice one iterates the above lemma, starting with a large n and decrementing n 
repeatedly, thus decomposing P"*" into trees of various sizes (plus a remainder of size 0). 

We have a similar selection lemma for mean, which can be thought of as the analogue 
of the previous lemma for the non-lacunary tiles P°. 

Lemma 4.2 ("Calderon-Zygmund decomposition for mean"). Let n E Z, P„ be a con- 
vex collection of lacunary tiles, and let f & S such that ||/||mcan*(p„) < 2". Then there 
exists a disjoint partition 

P„ = U T U Vn-i (36) 
TeT„ 

where P^-i is a convex collection of tiles such that 

||/||mean*{P„_i) < 2""^ (37) 

and Tn is a collection of convex trees T with disjoint spatial intervals It such that 

Imcan(PT) ~ II / II mean* (T) ~ 2" (38) 
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for allT G T. In particular, we have 



Y.\^t\<2- |/|< 2-^11/11^ (39) 



l/l>2" 



for any 1 < p < oo (with the implicit constant depending on p). 

Proof We define T„, to be the set of all trees T such that ll fl|mean*(T) > 2'^"^, and are 



maximal with respect to set inclusion, and then define Pn-i by (|33|). As before the trees 
T are convex and complete with respect to P„. The properties (pG]), (p^), (p^) are easily 
verified. By ( pSD we have /a.g/^.|j(j.)|>2n |/| ^ 2"|/r|, and ( p9D follows from the disjointness 
of the It- ■ 

In later sections we give some applications of the above machinery. These applications 
will all have a similar fiavor, in that they follow the following broad strategy: 

• Begin with a sum over a collection of lacunary tiles. 

• Use Lemma |4.1| and/or Lemma [4.2| to extract disjoint trees in this collection of a 
certain size (plus a remainder of size 0, which is usually trivial to handle). 

• Estimate the contribution of each tree in terms of the width \It\ of the tree and the 
size and/or mean of the tree. 

• Estimate the total width |/t| of the trees (using such estimates as (^TJ), (|5^, 

• Remove these trees from the collection, and repeat the above steps until the collection 
has been exhausted. 

• Sum up. 

This type of argument is fundamental to the general theory of tiles, as can be seen in 



the work on the bilinear Hilbert transform and Carleson's theorem (see e.g. |^5[, [^], ^ 
p6|). Apart from several technical details, the main differences between the arguments 
in those papers and the ones here are to eliminate the word "lacunary" from the above 
strategy, and replace the above Lemmata by more sophisticated tree selection algorithms. 

4.1. Example: Atomic decomposition of dyadic H^. Let < p < 1; implicit con- 
stants will be allowed to depend on p. In this section we reprove the standard atomic 
decomposition of H^, but we first need some notation. 

For any f E S, we define the dyadic Littlewood-Paley square function Sf to be the 
vector-valued function 

Sf{x) := (np/(x))pep+ = ((/,0p)0p(x))p6P+ = {Wf{P)M^))p€P+ 

taking values in /^(P+). Note that 

\Sf{x)\ = {Yl \Wf{P)\'^^Y/'. (40) 
PeP+ ' ' 

The adjoint operator 5** takes /^(P"^)-valued functions to scalar- valued functions, and is 
given by the formula 

S*{fp)pep+ = ^ Hp/p. 

PeP+ 
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In particular S*S is the identity on Sq. Also observeQ that \WSf{P) \ = \Wf{P) \ for all 
P e P"*", so in particular we have 

\\Sfh=\\fU I|5/||ba/o = II/IIbmo (41) 

on 5*0 and S respectively. 

We also define the cancellative dyadic maximal function Mf by 

Mf{x) := sup |np/(2;)| = sup f\= sup |[/]/|; 

pgpo r.xei H I J I r.xei 

this operator should not be confused with the dyadic Hardy-Littlewood maximal function 
Mf := M\f\. From (|) we have 

\UTf{x)\<2Mf{x) (42) 

whenever T is a convex tree in P"*". 

If Tree(/) is a complete tree, we define a (dyadic) atom on Tree(/) to be a function 
a G >S'o(/) such that ||a||2 < Equivalently, a E SqIS an atom on Tree(/) if and 

only if the wavelet transform Wa of a is supported on Tree(J) and || |Vrap||size{Tree(/)) < 
|/|~^/^; this is because of (|^). 

In this section we show 

Theorem 4.3 (Equivalent definitions of H^). Let f ^ Sq and < p < 1. Then the 
following statements are equivalent. 

(i) ll^/llp<l- 

(ii) l|M/llp<l- 

(iii) There exists a collection 1 of dyadic intervals, and to each / G I there exists a 
non-negative number c/ and an atom aj on Tree(J) such that f = ^jCjaj and 

Proof We first show that (iii) implies (i) and (ii). From the quasi-triangle inequality 

11/ + ^?||^< 11/11^ +11^11^ 

we see that it suffices to verify this on atoms, i.e. to show that H^a/Hp, || Ma/ ||p < 1 
whenever aj is a atom on Tree(/). 

Fix /, a. By construction Maj and Saj are supported on /, so by Holder it suffices to 
show ||5'a/||2, ||Ma/||2 ^ But this follows from the normalization of aj and 

the fact that S, M are bounded on L^. 

It remains to show that either one of (i) or (ii) are enough to imply (iii). Let / be any 
element of 5*0, thus / = J2pep+ Wf{P)4>P- 

Set a := We apply Lemma ^]T| repeatedly, starting with a sufficiently large n 

and setting P„ := P"*", and then decrementing n indefinitely. Eventually one obtains a 
partition 

=[j [j TUP.^ 

nGZTeTn 

"'^^Here we are allowing W to act on vector-valued functions in the obvious manner, i.e. if / = 
(/i: ■ ■ • , fn) is a vector, then Wf{P) :— {Wfi{P), . . . , Wfn{P))- Similarly we can define vector-valued 
BMO, etc. 
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where the T„ are as in Lemma |0|, and || |W^/P||sizo*(p_oo) = 0. Thus Wf vanishes on P_oo, 
and only a finite number of T„ are non-empty. We thus have / = Xlnez X]tgt„ ^tI- If 
we then set c/j, := 2"/^ | J-r | ^''^ and a/^ := Uxf /cij, then we have 

neZ TeT„ 

By (^2p (with p = 2) we see that each ajj, is an atom. To show (iii) it thus remains 
to show that 

n TgTn n TeT„ 

First suppose that (i) holds. For each n and each T G T„, we see that 

J It 

for all 2 < g < oo by the John-Nirenberg inequality (Lemma T^), (|1T|), and (^Tj). Also, 
we have 



By Holder we thus have 



/" \sn-> I |5nr/r'>2-/2|/. 



for any < r < p, with the implicit constant depending on r. This clearly implies 



JxGlrr:\Sf(x)\>2"/2 



cG/t:|5/(x)|>2"/2 

Summing over all T G T„ and using the disjointness of the It we obtain 



f |^/r>2-/2^ |Jt|. 



Multiplying by 2"(p-'')/2 and summing over we obtain 

n:\Sf {x)\>2"/^ n Telr 

Since the left-hand side is comparable to the claim follows. 

Now suppose instead that (ii) holds. By (|3|), (|2|) we have Jj^lMf]"^ > 2"'"/2|/^| fo^ 
all < r < p. Now we argue as with Sf to obtain (iii) from (ii). ■ 

From the above proof we see that the atoms in fact obey the BMO bound ||a/j,||BA/o ^ 
|/|~^/^. One can improve this BMO control to L°° control by repeating the argument 
in the John-Nirenberg inequality (Lemma |3.4| ). Namely, one locates the maximal sub- 
intervals where the averages of o/^ are large and separates off those trees, leaving behind 
a bounded atom. One then repeats the process until only bounded atoms remain, in the 
spirit of Lemma We omit the details. 
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Let / be in BMO. From (0), (D, we have 

BMO = sup ^ llllr/lla. 



\f-{f]i\'Y^'- 



Clearly it suffices to take suprema over complete trees T, thus by (|) 

1 

\bmo = sup 



|/|l/2 



By duahty we thus have 
or equivalently 



BMO = sup sup |/| ^/^|(/,a)| (43) 

/ aeS'o(/):||a||2=l 

\bmo = sup{|(/, a)| : a is a atom}. 
Thus, as is well known, BMO is the dual of H^. 

5. The Carleson embedding theorem and paraproducts 

We now give a slight variant of the above method, in which one selects trees using 
the averages [f]i instead of the sizes. This type of argument is of course very old, and 
the arguments here are by no means new. On the other hand, this type of tree selection 
method is a special case of the "mean selection" algorithm used (together with a size 



selection algorithm) in the proof of Carleson's theorem in [41 
We begin with 

Lemma 5.1 (Carleson embedding theorem). Let P be a collection of lacunary tiles, a : 
P he a function, and 1 < p < oo. Then we have 

PeP 

for all locally integrable functions f , with the implicit constants depending on p. 

Proof We apply Lemma [4.2| repeatedly, starting with a sufficiently large n and decre- 
menting n repeatedly. This gives us a partition P = IJnez UrGT„ ^ P-oo where the 
T„ are as in Lemma and ||/||mean*(p_oo) = 0- The contribution of P_oo is zero, so it 
suffices to control 

n TeT„ PeT 

If P G T G T„, then |[/],^| < ||/|Ucan{P) < ||/||mean.(T) < 2". From this and (|), i), (|3|) 
we may estimate the previous by 

^EEE«(^)2"^ 

n TeT„ P(^T 
<E E ll«llsize*(P)|/T|2"^ 

n reT„ 



<Ell«ll--*(P)2"'2~'' / I/I 

i/r 



I size* (P) 
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as desired. ■ 

We can apply this theorem to various hnear and bihnear operators. To do this we shall 
need some notation. 

For any sequence (ap)pgp+ of real numbers, we define the wavelet multiplier W~^apW 
from 5*0 to 5*0 by 

W-^apWf := J2 apWf{P)(j)p. 

PeP+ 

Wavelet multipliers are the discrete analogue of pseudo-differential operators, with ap 
being the discrete analogue of a symbol a{x,^). Observe that if ap is bounded, then 
W~^apW is bounded on and also bounded on BMO. One can of course extend the 
domain W~^apW from 5*0 to S, although some of the algebra properties are lost in doing 
so (since W is not injective on S). 

Let /, g be elements of S. We define the "high-low", "low-high", and "high-high" 
paraproducts(3 

Mf,9)--= Yl Wf{P)[g]p<Pp 

PGP+ 

Mf,9)--= 5^ [/]pW^^7(^)0P 

PGP+ 



T^hh 

PGP + 

These paraproducts have the symmetries 



(/,^):= E Wf{P)Wg{P)f^ 
J~T . Up 



T^hhU^ 9)h = J 7rhi{9, h)f = J 7iih{h, f)g 

T^hh{.9,f)h= / 7ihiif,h)g= / nih{h,g)f (44) 



= J2 Wf{P)Wg{P)[h]p 
PeP+ 

and can be expressed in terms of the Littlewood-Paley square function S: 

T^hiif, 9) = S*{gSf); TCih{f, g) = S*{fSg); HhhU^ 9) = Sf ■ Sg. 

When /, g have mean zero (i.e. f,gE Sq), then the paraproducts decompose the pointwise 
product operator: 

f9 = T^hiif, 9) + -rcihif, g) + T^hhU^ 9)- (45) 

To see this, it suffices by bilinearity to reduce to the case when f = (pp and g = (pg for 
some P,Q & P^. If Ip and Iq are disjoint then both sides are zero. Thus there are only 
three cases: P >' Q, P <' Q, and P = Q. In these three cases the reader may easily 
verify that fg is equal to the high-low, low-high, or high-high paraproduct of / and g 
respectively, and that the other two paraproducts vanish. 

^*The continuous counterparts would be something like JiQtf){Ptg)^, J{Ptf){Qtg)^, and 
/ {Qtf){Qtg)^, where Qt is as before and Pt is a suitable approximation to the identity at width e.g. 
P^ — e*^. The precise definition of a paraproduct is not standardized, for instance TThh is not considered 
a paraproduct in some texts. 
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We observe that the high-low and low-high paraproducts can be written as wavelet 
multipliers: 

7ihiif,9) = W-'[g]pWf; Mf,9) = W-'[f]pWg. (46) 

The high-high paraproduct cannot be written in this way, but we have the useful rela- 
tionship 

7rhh{W-'apWf,g) = 7rhh{f,W-'apWg). (47) 

One can also write paraproducts using both lacunary and non-lacunary tiles, for in- 
stance 

intTThhifg)h= ^ |/|"^/^(/,0p+(/))(5(,0p+(7))(/i,0pO(j)). (48) 
/ dyadic 

The bilinear Hilbert transform turns out to have a similar expansion, but with the sum 
ranging over a larger collection of triples of tiles than the ones for paraproducts (specif- 
ically, the tiles need not be lacunary or non-lacunary, and range over a three-parameter 
family rather than a two-parameter one). See e.g. [^], [^, |^ . 



From the Carleson embedding theorem we have paraproduct estimates: 
Corollary 5.2 (L^ x BMO LF' paraproduct estimates). We have 

hhi{f^9)h ^ ll/lbll^lloo 

and 

hhh{f^g)h^ hih{f^g)h ^ ll/lhll^llBMo 

for all f,g e S. 

In other words, paraproducts map x L°° to (just as the pointwise product does), 
and the L°° factor can be relaxed to BMO as long as one only considers high frequencies 
of a BMO function. Note that the low frequency portion of a BMO function is somewhat 
ill-defined since a BMO function might only be determined up to a constant. 
Proof The first bound follows from (^6]) since [g]p is bounded by ||5'||oo- To prove the 
second bound, it suffices by (|1) to consider nih. By orthogonality we have 



\Mf,9)h = {Yl \Wg{P)nf]i. 



2U/2 



PeP^ 



The claim now follows from Carleson embedding (Lemma |0|). ■ 

Lemma 5.3 {BMO x BMO BMO paraproduct estimate). We have 

\\T<'hhif,g)\\BMO ^ II/I|bmoI|5'IIba/o 

for all f,g e S. 

For the other paraproducts iihi, T^ih one must place the "low" factor in rather than 
BMO, as in Lemma |5.2| ; this is again an easy consequence of (^61). 



Proof By Lemma ^]3| with p = 1 it suffices to show / \Ii^^cc(i)T^hh{,f i g)\ ^ |-^| for all 
dyadic intervals /. 
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Fix /, and expand the left-hand side as 

PeP+ 



P\ 

The summand vanishes unless P G Tree(/). Thus we can write the above as 

Xip 

vv J\r jvv yyj- ) — 
PGTree(/) ^ 

Since nTree(/) IS boundcd on L^, we can bound this by 



< 



J I Yl Wf{P)WgiP)^ 



PeTree(/) 

Putting the absolute values inside and performing the integration, we can bound this by 

< E \Wf{P)\\WgiP)\. 

PeTree(/) 

The claim then follows from Cauchy-Schwarz and (J^). ■ 

5.1. Weak-type estimates. We now show how to use the above machinery to prove 
L^'°° paraproduct estimates, where is the weak (quasi-)norm 

||/|U.,oc :=supA|{x:|/(a;)|>A}r/^. 

A>0 

We need the following basic characterization of weak for < p < oo: 

Lemma 5.4. Let < p < oo and A > 0. Then the following statements are equivalent 
up to constants: 

(i) ll/IUoo< A 

(ii) For every set E with < l-El < oo, there exists a subset E' G E with \E'\ ~ \E\ and 
\{f,XE')\<A\E\y'^'. 

Here p' is defined by 1/p' + 1/p = 1 (note that p' can be negative!). 

Proof To see that (i) implies (ii), set 

E' := E\{x : > CA\E\-^/p}. 

If C is a sufficiently large constant, then (i) implies \E'\ ~ \E\, and the claim follows. 

To see that (ii) implies (i), let A > be arbitrary and set E := {x : Re{f{x)) > A}. 
Then by (ii) we have 

A|^| ~ Al^'l < A\E\^/P', 
and (i) easily follows (replacing Re by —Re, Im, — Im as necessary). ■ 

When p > 1 we can always set E' = E, and the above lemma then reflects the duality 
between and However for p < 1 the freedom to set E' to be smaller than E is 

necessary (since / need not be locally integrable). 

A typical application of Lemma is 
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Proposition 5.5 {L^ x L'^ —>■ L^'°° paraproduct estimates). We have 

hhl{f,9)\\r,oo < ||/||plkl|q 

whenever 1 < p,q < oo and 1/p + 1/q = 1/t. Similarly for nih, Hhh- 

Note that r can be less than 1. One can strengthen the weak U to strong U by 
multihnear interpolation (see e.g. 0, [0, The continuous version of these dyadic 

paraproduct estimates can be found in, e.g. [jl4|-||T9|; the version for r < 1 was first 



proven in (with some special cases in 0, 10). It is possible to obtain the continuous 
estimates from the dyadic ones via averaging arguments, but we shall not do so here. 
Proof We first consider Tihi- We may normalize = = 1; we may assume that / 
and g are dyadic test functions. Let E he a. measurable set with < < cxd. We need 
to find a set E' <Z E with \E'\ ~ \E\ such that 

\{Mf,9),XE')\<\E\'/^'. (49) 

By rescaling (using the hypothesis l/p+ 1/q = 1/t) we may take \E\ ~ 1. 
We choose E' as 

E' := E\{x : M\f\P{x) + M\g\'^{x) > C}. 
If C is large enough, then \E\ ~ \E'\ by the Hardy-Littlewood maximal inequality (see 
e.g. 0). 

We wish to show (||) with \E\ ~ 1. By (||) it suffices to show that | Epgp ^/(^)M-fp^^^^'(^)l ~ 
1 for all convex collections P of tiles. 

We may remove all tiles in P for which IpCiE' = 0, since Wxe' vanishes on these tiles. 
For any remaining tile P we then have 

i/r < \ip\ (50) 

Ip 

by construction of E'. Similarly, we have 



^llmcan'(P) = SUp[|^|]/^ < M'^f/^ < 1. 

PeP 

Thus we reduce to showing 



J2\Wf(P)\\WxE'{P)\<l. (51) 
PeP 

From ( pOD and the boundedness of the Littlewood-Paley square function (see e.g. ||51| ) 
we have 

[ |^nTree(/p)/r < / \IlTreeiI,)ir < [ IJT < \Ip\, 

J Ip J Ip J Ip 

for all P G P, thus 

/ ( E \wfiP)\'^f^r'<\ip\. 

■^^P PeTrcc{/p) ' 

Applying Chebyshev's inequality and Corollary ^]2| we thus see that || |W^/P||size*(P) ^ 1- 
Also, we have 

\\\WXE'\^\\si^e*(P) < WXE'WImO < WXE'Wlo < ^■ 

Thus we can find an n = 0(1) such that 

|||W^/ri|size*(P„) < 2^" and Illl^Xi^f ||si.e*(P.) < 2'"^/^ 
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where have set P„ := P, and s > 1 is an exponent close to 1 to be chosen later. 

By a finite number of applications of Lemma [4.1| with a := or a := iW^Xs'P 'we 



may partition P„ = IJtgt„ P^-i where Pn-i is a convex collection of tiles such that 

|||W^/nisize-{P„_o < 22("-^) and \\\WxE'?Ue*iJ^r.-.) < 2'^"-'^^/^ 
and T„ is a collection of convex trees with disjoint spatial supports such that either 

It 

or 

\IIt:Xe'Y>Tp\It\ 



'It 

for each T G T„. From fE2f) we thus have 



[MfY + / {MxE'Y > 2"^ Yl 

TeT„ 



from our assumptions on /, xe' and the Hardy-Littlewood maximal inequality (see e.g. 
5l| ) we thus have X]tgt„ ^ 2~"p. We now return to (|5lD , and estimate the contri- 
bution of the trees in T„ by 

TeT„ PeT 

We apply Cauchy-Schwarz followed by the size control on Wf and Wxe' we may bound 
this by 

E l^^llll^/llsYimlllW^Xi^'riLYiT)^ E |Jt|2'^2"^/^ < 2-"W^/^ 
tgt„ TeT„ 

We now turn to the contribution of the tiles in P^-i. We may iterate the above 
procedure, decomposing P„_i into T„_i and Pn-2, and continue in this fashion until we 
are left with a collection of tiles P_oo with size zero, which we can discard. Summing up, 
we can thus control the left-hand side of (0) by < ^n<o(i) 2~"''^2^2'^p^^ . If one chooses s 
sufficiently close to 1, then this sum converges, and we are done. 

A similar argument handles vr^/. The remaining paraproduct iThh then follows from (|45|) 
and Holder's inequality (which is still valid for r < 1). ■ 

One can modify the above argument to obtain the corresponding estimate for the 
bilinear Hilbert transform (in the Walsh model, at least); see e.g. [^, A 



difficulty in that case is that the tiles are no longer lacunary, and one cannot guarantee the 
spatial disjointness of the trees T in T„. However one can still make the trees essentially 
disjoint in phase space, but then one can only use estimates to control X^TgTn \^t\ 
instead of estimates. Because of this, the above strategy only seems to work for 
the bilinear Hilbert transform when r > 2/3; it appears that one needs very different 
techniques to handle the remaining case 1/2 < r < 2/3. 
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6. Calderon-Zygmund operators, and T{b) theorems 

We now consider the theory of Calderon-Zygmund operators, and specifically the as- 
pects of the theory related to T(l) and T(6) theorems. 

For simplicity we shall restrict ourselves to model operators of the form 

T/(x) := J K{x,y)f{y) dy 

where A' is a locally integrable function on (x, y) e [0, 2^] X [0, 2^] which obeys the kernel 
condition 

\K{x,y)\<-^ (52) 
\x - y\ 

and the perfect dyadic Calderon-Zygmund conditions 

\K{x,y) - K{x',y)\ + \K{y,x) - K{y,x')\ = (53) 

whenever x,x' G / and y E J for some disjoint dyadic intervals / and J. Equivalently, 
K is constant on all rectangles {I x J : I, J are siblings}. We also impose the technical 
truncation conditions[^ that K vanishes on all diagonal squares I x I where \I\ = 2~^^, 
and also vanishes on the squares [0,2^^-^] x [2*^-^2*^] and [2^^"^ 2*^] x [0,2*^"^]. 

We refer to operators T of the above form as perfect dyadic Calderon-Zygmund oper- 
ators. These operators can be thought of as the dyadic analogue of truncated Calderon- 
Zygmund operators, where the cancellation conditions are perfect. (For ordinary Calderon- 
Zygmund operators one can bound the left-hand side of ( p3D by something like 0{\x — 
x'l/lx-y]"^)). 

Let T be a perfect dyadic Calderon-Zygmund operator. From ( pB| ) we observe that 
T maps S to S, and furthermore if / is any dyadic interval and / G Sq{I), then T/ is 
supported in /. Similarly for T*. We shall use this cancellation heavily in the sequel. 

Examples of perfect dyadic Calderon-Zygmund operators include multiplier operators 
W~^apW and paraproducts / i— > 7r(a, /). It turns out that these are essentially the only 
such operators. Indeed, we have the well-known sphtting (see 0, |Q) of a perfect dyadic 



Calderon-Zygmund operator into three parts: the diagonal part, the T(l) paraproduct, 
and the T*(l) paraproduct: 

Lemma 6.1. If T is a perfect Calderon-Zygmund operator, then we have 

Tf = W-\T^p, ^p)Wf + 7r,KT(l), /) + 7r,,(T*(l), /) (54) 
for all / G 5*, where f = g denotes the statement that Wf = Wg. 



^^The continuous analogue of these conditions would be that K{x — y) vanishes when |a; — y| < 
or \x — y\ > 2^^. It is possible to formulate T{b) theorems which do not require truncated operators, but 
this introduces some additional technicalities which are not relevant to the present discussion, and so we 
have chosen to ignore the issue for expository reasons. 

^^This is equivalent to f — g being constant on [0, 2^^]. 
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Proof We need to show that 

(T/,0Q)=(T0Q,0Q)iy/(Q) 

+ W^(T(1))(Q)[/]q ^^^^ 

+ W{T*{mP)Wf{P)[c^Q]p 
PeP+ 

for all / e ^ and Q G P+. 

When / = 1 the identity is clear, so by subtracting off a multiple of 1 if necessary we 
may assume / G Sq, thus / = J2p ^f{P)'Pp- We can then decompose 

(T/,0q)= J2 W^/(^)(T0p,0q). 

PGP+ 

If Ip and Iq are disjoint then (T0p, 0q) = since T is a perfect dyadic Calderon-Zygmund 
operator. Thus we may partition the sum into the portions P = Q, Q <' P, or P <' Q. 

The diagonal term P = Q is the first term in (|55|) . Now consider the Q <' P portion. 
We write this as 

( Wf{P)<Pp,T*<PQ). 

PeP+:Q<'P 

The function T*4>q is supported on Iq, while 

Y wf{P)<pp = f- Y ^f(P)<l'p 

PeP+:Q<'P PeP+:Q5;;'P 
is constant on Iq and has the same mean as f on Iq, we thus have 

( WfiP)<Pp,T*<PQ) = ([/]q,T*0q) = (T(1),0q)[/]q 

PGP+:Q<'P 

which is the second term in (|55|). 

Finally, we consider the P <' Q term. Observe that the P summation in 
unless P <' Q. It thus suffices to show that 

i^/(p)(T0p,0q) = w{i:*{i)){P)wf{P)[<pQ]p. 

But this follows since T(j)p is supported on Ip and (f)Q is constant on I p. ■ 

Corollary 6.2 (Dyadic global T(l) theorem). LetT he a perfect dyadic Calderon-Zygmund 
operator such that 

||T(1)||bmo,||T*(1)||ba/o< 1 
and we have the weak boundedness property 

\{T<Pp,<Pp)\<lforallPeP+. (56) 

Then T is bounded on . 

Note that the converse of this theorem is easy: if T is bounded on L^, then we certainly 
have the weak boundedness property, and by ( ^3]) 

||T(l)bA/o = sup sup |/|-i/2|^T(l),a)| = sup sup \I\-'/^\{T{xi), a)\ < 1 

I aG5o{/):||a||2=l ^ ae5o{/):||a||2=l 

and similarly for T*(l). Indeed we observe that T and T* must map to BMO. 



55) vanishes 
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Proof From the conditions on the kernel K and duality it suffices to show that | (T/, g) \ 



< 



\2\\g\\2 for all f,g E S. By splitting [0,2*^] into two intervals it suffices to show this 
when / e S{[0,2^'-^]) or / G S{[2''-\ 2^^]). 

Without loss of generality we may assume that / G >S'([0, 2^~^]). From (|5^), Corollary 



5^ , and the hypotheses on T we see that this estimate holds for all g G 5*0. Also, from 
the truncation hypothesis the claim trivially holds for g = X[2«^-i,2Af]- Since S is spanned 
by 5*0 and X[2*^-i,2"]) the claim follows. ■ 

We can rephrase Corollary |6.2| as an equivalent "local" version: 

Corollary 6.3 (Dyadic local T(l) theorem). LetT be a perfect dyadic C alder on- Zygmund 
operator such that 

I|T(x/p)I|li(/p), ||T*(x/p)||lm/p) < \Ip\ for all P G P+. (57) 
Then T is hounded on . 



Proof From ( p7D we see that | /j^^j^ K{x, y) dxdy\ < |/p|. From ( |52D we thus have that 

(f)p{x)K{x,y)(f)p{y) dxdy\ < 1, 



' IpXip 

or in other words that ( ^6]) holds. By Corollary ^]2| and symmetry it thus suffices to show 
that T(l) G BMO, or equivalently (by Corollary ^]3| and duality, cf. (^3])) that 

|(T(i),M| <|/|||/^,|U 

whenever / is a dyadic interval and hj G Sq{I). But since T*{hi) is supported on /, we 
have (T(l), hj) = {Txi, hj), and the claim follows from (p7|). ■ 

Note that the converse of the above Corollary is immediate from Holder's inequality. 
One can also deduce Corollary |6]^ from Corollary |6.3| , but we leave this to the reader. 

We shall consider generalizations of the global and local T(l) theorem next, after some 
preliminaries on accretivity. 

6.1. Accretivity and one-sided T(6) theorems. Let 6 G 5 be a complex- valued func- 
tion, and P C P+ be a collection of tiles. We say that b is pseudo-accretive on P if 

|[6]p| > 1 for all P G P. (58) 

If we in addition have the property 

\[bUimPr\>l (59) 

for the two children^ Pi, P^ of tiles P G P, we say that h is strongly pseudo-accretive on 
P. Note that we are not assuming any L°° control on h in the above definition. 

If h is pseudo-accretive on the entire tile set P"*", we simply say that h is pseudo- 
accretive. Examples of pseudo-accretive functions include the accretive functions, for 
which Re h{x) > 1 for all x G [0, 2*^] 

The T(l) theorem can now be generalized to "one-sided T(6) theorems" in which we 
control T(6) and T*(l). We give the dyadic version of an argument of Semmes |5^ (who 
considered the case T(6) G BMO, T*(l) = 0): 



^^For this to be well defined, P cannot contain any tiles with minimal length \Ip \ = 2 ^'^ . 
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Theorem 6.4 (One-sided global T(6) theorem). Let h he a pseudo-accretive function on 
P"*" with \\h\\BMO ^ 1- Let 1 he a perfect dyadic C alder on- Zygmund operator oheying the 
weak houndedness property ([56| ) such that 

||T(6)||sMo,||T*(l)bMo<l. 

Then T is hounded on L^. 

Proof From the dyadic T(l) theorem ( Corollary |6 .21) it suffices to show that ||T(1) \\bmo ^ 
1, or in other words that |IV(T(l))p has bounded maximal size (i.e. is a Carleson mea- 
sure). 

From (|5l) and (M) 



T{b) = W^-^(T0p, (l)p)Wb + W-'[b]pWT{l) + 7rM(T*(l), b). (60) 
From (|58|) we may thus solve for T(l): 

T(l) = W-'[b]p'W[T{b) - W-\T<f)p, <Pp)Wb - 7r,,(T*(l), b)] (61) 

and the claim follows from (|58|) , (^6]), the hypotheses ||T(6)||pjv/0; II^IIbmo ^ 1; and 



Lemma 5.2 



Note that one only needs b in BMO in the above argument instead of the more usual 

One drawback to the above theorem is that it requires the function b to be pseudo- 
accretive. Fortunately, it is not too difficult to construct pseudo-accretive functions. The 
following basic lemma says that if a function has large mean, then it is pseudo-accretive 
on a non-trivial set of tiles. (Equivalently, if a function has small mean on too many small 
tiles, then it must have small mean globally). 

Lemma 6.5. Let Tg C P+ he a convex tree, and let b he a function such that 

||nTo6||2<Co|/ror/' (62) 

and \[b]ij,J > 6 for some Cq,6 > 0. Then there exists < e <^ 1 depending only on 
Co and 6 and a family T of disjoint convex suh-trees of Tq whose tops form a {1 — e)- 
packing of Tq, and such that \[b]p\ > e for all P G To\ IJ^^grj, T. Furthermore we have 
MptI < £ for allT e T. 

Proof Let P denote those tiles in Tq for which \[b]p\ < e, and which are maximal with 
respect to the ordering <'. Clearly the tiles in P have disjoint spatial intervals and obey 



b\ < e\Ip\. 

'ip 

To prove the lemma it will suffice to show the (1 — £:)-packing property. Suppose for 
contradiction that 



I U^p|>(l-^)I^To|. (63) 
PeP 

Using the identity Jj^ b = Jjp{[b]ij.^ + ^Tob) and summing in P we obtain 

I / [b]ir,+^Tob\<e\ U/p|<e|/Tol; 
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if e is sufficiently small with respect to 6, we thus see that 
Since n^ofo has mean zero on J^g, we thus have 

I / ^tA > -|/to|. 

But this will contradict and ( p^ by Cauchy-Schwarz, if e is sufficiently small. ■ 

This lemma combines nicely with Lemma ^TT]. Together, these lemmata heuristically 
allow us to treat "large mean" as being equivalent to "pseudo-accretive" , at least for the 
purposes of placing something in BMO. As an application we now use this lemma to give 
a localized version of Theorem |6.4| . Similar results^ have been used to solve the Kato 
problem in higher dimensions (see e.g. 0): 

Theorem 6.6 (Local one-sided T(6) theorem). LetT be a perfect C alder on- Zygmund op- 
erator obeying ||T*(1)||bmo ^ 1 o-nd (0). Suppose also that for every P G P+ there exists 
a function bp E S{Ip) with the normalization condition 



p\p 



and the bounds 



[ \bp\' + \Tbp\' < 



'Ip 

for all P G P^. Then T is bounded on L^. 

The weak boundedness condition (|56|) can actually be removed; see the remarks after 
Theorem |6.8| . Informally, this theorem asserts that to prove the boundedness of an 
operator T it actually suffices to establish boundedness for a single function bp for each 
interval Ip, provided that 6p is not degenerate (in the sense that its mean is large) and 
provided that T*(l) is under control. In the next section we shall remove the condition 
on T*(l), obtaining a "two-sided" version of this theorem. 

Proof Again it suffices to show that T(l) is in BMO. By Lemma it suffices to show 
that for every complete tree T we have 

mmmi' < \it\ (64) 

for some collection T of disjoint convex trees in T whose tops form a (1 — £:)-packing of 
T for some e > 0. 

Fix T. By Lemma we can indeed find such a collection T with the additional 
property that b is pseudo-accretive on T\ [Jt'st '^^^ claim ( [0^ ) then follows from the 
argument used to prove Corollary I 



^^More precisely, the solution to the Kato problem requires a matrix-valued analogue of this theorem 
in which the bp are matrix valued, and T maps matrices to vectors. The argument then requires an 
additional subtlety, namely a preliminary partition of the tile set P"*" into 0(1) pieces, where on each 
piece the vector- valued coefficients W{T{1)){P) lie in a narrow conical region. See M. 
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The above arguments do not extend well to two-sided situations in which one controls 
T(6i) and T*{b2) (unless one of bi, 62 is close to a constant, e.g. in BMO norm). In order 
to handle the general case we need adapted Haar bases, to which we now turn. 

6.2. Adapted Haar bases, and two-sided T(6) theorems. Let P be a collection of 
tiles, and let 6 be a function which is strongly pseudo-accretive on P. For each P G P, 
we define the adapted Haar wavelet 0p (introduced in [|^; see also 0]) by 

0P-|/p| ^Xi,,-\Ip\ i^X/p.- (65) 

Observe that this collapses to 0p if 6 is constant on Ip. For non-constant b, (pp is no 
longer mean zero, but one can easily verify that 0p still obeys the weighted mean zero 
condition 

60^ = 0. (66) 
As a consequence we have the orthogonality property 

(,^60^ = for all distinct P,Q eP. (67) 



From (|65|) we see that 

M 

[b]p [b]p! + [b]p, 

In particular, from the strong pseudo-accretivity condition (|59D we have the bound 



b^'p = '^^^^ = (6^ 



'pb^pl > 1- (69) 

It is interesting that this bound uses only the strong pseudo-accretivity of b, and in 
particular does not require control on b. 
Define the dual adapted Haar wavelet ipp by 



By (0), ( p9| ) we thus have that (iI'pj^'q) = Spq where S is the Kronecker delta. In 
particular we have the representation formula 

/ = E ^bf(P)^P (70) 
PeP 

whenever g is in the span of {ipp '■ P € P}, where the adapted wavelet coefficients Whf{P) 
are defined by 

W,f ■.= {/, <p'p). 
We have the following basic orthogonality property: 

Lemma 6.7. Let T be a convex tree, and let b be a function which is pseudo- accretive on 
T and obeys the mean bound 

|||&nimcan*(T)<l. (71) 
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Then for any functior^ f E S we have 

(^iw(p)r)^/'<ii/ii2. (72) 

PeT 

In fact, the more general estimate 

mb'f){p)n^' < wfumriL^iT) (73) 

Per 

holds for any f,h'ES. 

Proof From (^) and we observe that 

ll|W^&rilsizc*(T)<l. (74) 

We first prove From and the identity \Ip\-^''^Wh{P) = [b]p^ - [b]p = [&]p - 

[b]p^, we obtain the identity 

W'fiP) = Wf{P) - ^^[f]p. 

MP 



If we replace Wb by Wh then ([7^) follows from Bessel's inequality and the orthonormality 
of the Haar wavelets (pp. By the previous identity and the triangle inequality it thus 
suffices to show 

Mh{P). 



I ^^j^ ujpi ; ~ 



2- 



PeT 



We may discard \b]p by pseudo-accretivity (^). The claim then follows from Carleson 
embedding (Lemma |5.1|) and ([7^). 

Now we prove (ffSl). Let Q denote the collection of tiles in P"*" which are children of 
tiles in T, but are not in T itself. In order to ensure that the intervals {Iq : Q G Q} 
partition It we will allow the tiles Q to have spatial intervals \Iq\ = 2~^^~^; the partition 
property then follows from the convexity of T. 

The function b'f — J2q£qW f]QXiQ has mean zero on every interval Iq, and is thus 
orthogonal to 0p for every P E T. We may thus freely replace b'f by the averaged 
function ^Q^Q[b' f]QXiQ in (|73D. By ( [72D it thus suffices to show that 



X][^7]qX/qI|2 ^ Hi II2III" I l|mcan*(T)- 



< II -f Il2|| |!,/|2 

QeQ 

But from Cauchy-Schwarz we have 

|2 |||7,/|2 



I[^7]qI \Iq\ ^ II/IIl2(/q)III^'I l|mean(2Q) < II / II L2(/q) II I ^' I ||mean*(T), 

and the claim follows by summing in Q. 

We can now give our main result, namely a dyadic local T(6) theorem. 



19t 



It can easily be seen, by aid of ([70|), that the estimate ([72|) can be reversed for all / in the span of 
the </)p, but we will not use this. 
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Theorem 6.8 (Dyadic local T(6) theorem). Let T be a perfect C alder on- Zygmund oper- 
ator, and suppose that for each P e P"*" we can find functions bp, 6p in S{Ip) obeying the 
normalization 

[b]>]p = mP = 1 (75) 

and the bounds 

f i&pr+ iT6^r+ i&pr+ |T*6^|2 < |/p|. (76) 
jip 

Then T is bounded on . 

This theorem is a stronger version of the local T(6) theorem in (but for the dyadic 
setting with perfect cancellation), which required L°° control in (|76|) instead of control. 
(This was generalized to BMO control and to non-doubling situations in [0). Also it 
required the global T(6) theorem of David, Journe and Semmes (which we instead 
deduce as a corollary of Theorem |6.8|) . We make some further remarks after the proof of 
the theorem. 

Proof This proof is somewhat lengthy and so we split the argument into several stages. 
Step 0. Preliminary estimates. 

We begin with a basic lemma which already shows the importance of the normalization 
Lemma 6.9 {h\, spans S{Ip) / Sq{Ip)). For any tile P G P+ and any f G S{Ip), we have 

ii/iiw)<ii/-[/]p|iw) + i/pr^/'K/,&p)i- 

Similarly for b^p . 

Proof Let h be an arbitrary element of S{Ip) with \\h\\2 = 1. Then 

(/, h) = {f,h- [h]pb'p) + [h]p{f, b'p) = {f- [/]p, h - [h]pb'p) + [h]p{f, b'p). 
By Cauchy-Schwarz and (^6D we thus have 

\{f,h)\<\\f-[f]phm,) + \ip\-'/'\{f,b].)\. 

Taking suprema over all h, the claim follows. ■ 

A useful application of the above lemma is the following convenient truncation property 
of the 6p and 6p (already observed in |1TT| ): 

Corollary 6.10. Let P,Q be lacunary tiles with Q <' P. If we have the estimate 

\Tb'p\' + Kl' < K\Iq\ (77) 

for some K > 1, then we have 



[ mb'pXi,)\' < K\k 

J2In 



Similarly for b^p (but with T replaced by T*) 
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Proof By ([5^ ) and ([77|) the portion of the integral on 2Iq\Iq is acceptable, so it suffices 
to bound the integral on Iq. From Cauchy-Schwarz, ( [76D and ([77|) we have 

mbUiJ,bl)\ = |(6k/«,T*6^)| < ||6}.||l^(/^)||T*6^||l^(/,) < i^^/'|/Q|. 
By Lemma |6.9| it thus suffices to show that 

l|T(&k/J - mb'pXiJMLHi^) < K'/W. 
Now observe that for every h G So{Iq) we have 

{T{b\,xi^),h) = {b],xic,^*h) = {b],,T*h) = {Tb'p^h). 
By duahty this implies that 

T(&k/p) - mbWi,)]Q = nb'p) - mb'p)]Q 
on Iq. The claim then follows from (^). ■ 

This Corollary will be useful in estimating the operator T when acting on objects such 
as ip'q which can be expressed as linear combinations of truncated versions of 6p. Similarly 

when estimating T* on objects such as ipQ . 
Step 1. Overview of main argument. 

We now begin the main argument. Let A be the best constant such that 

\\T^*XIp\\l^Ip) < A\Ip\ 

for all tiles P G P^. We claim that A = 0(1); from this and the corresponding claim for 
Txip (which is of course symmetric) the theorem will follow from the local T(l) theorem 
(Corollary |0| ). 

In fact we will show 

I / T/|<((l-e)A + 0(l))|/p|||/|U (78) 
Jip 

for all tiles P G P"*" and / G S{Ip), and some < e ^ 1 depending only on the implicit 
constant in ([76|) . By duality this implies that A < {1 — e)A + 0{1), which will prove the 
desired bound on A. 

Fix P, f. We shall prove the estimate (|78D in three steps. Firstly (in Step 2), we decom- 
pose / and reduce matters to proving a Carleson measure type estimate on the wavelet 

coefficients \{T*xip,'ipQ)\'^', this argument shall use stopping-time arguments (which we 
encapsulate as Lemma |6.11| ) based on bp but not on 6p. Then (in Step 3), we decompose 
Xip and use stopping time arguments (again using Lemma |6.11| ) based on 6p but not on 



6p. It will be important not to try to handle 6p and 6p at the same time as we will lose 
the crucial {1 — e) packing property of the trees left out by the stopping time algorithm 
if we do so. 

The purpose of these stopping arguments is to impose some pseudo-accretivity and 

other regularity properties on the bp and 6p. Once we have enough regularity properties, 

we can then (in Step 4) do an elementary computation to estimate the wavelet coefficients 
^1 

\{T*xip,'ipQ) \ pointwise by the quantities which we know to be controlled by hypothesis 
(see ( [76D below). 

Step 2. Pruning the bad tiles of bp. 
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We now begin the first of tlie tliree steps outlined above. We would like to break up 
/ into linear combinations of the wavelets i/jq , but we cannot do this for all Q because 
we do not control the strong pseudo-accretivity of bp. However, by using Lemma ^]5| and 
some other selection algorithms we can find a large subtree of Tree(P) for which we can 
decompose / as desired, modulo acceptable errors: 

Lemma 6.11. Let P G be a tile. Then we can partition 

Tree(P) =TiUPfe,//erU |J T' 

T'eT 

where 

• T is a collection of disjoint complete trees in Tree(P) whose tops form a {1 — e)- 
packing o/Tree(P) for some < e ^ 1 (depending only on the implicit constant in 

• Ti is a tree with top P such that b\, is strongly pseudo-accretive on Ti (with constants 
perhaps depending on e ); 

• Pbuffer is a 2-packing o/Tree(P), Ti U Pbuffer is convex, bp is pseudo-accretive on 
Ti U Pbuffer and we have the mean bounds 

Wlbpl"^ + |T6^Hmcan*(riUP6„//,,) < 1 (79) 

(with the implicit constant depending on e). 

• We have the decomposition 

QeTi T'eT QePtu/Zer 

whenever f G S{Ip), where the "buffer functions" (pq are supported on Iq, have 
mean zero, and take the form 

where the co-efficients oq, a'g, a'q, a'q depend on f and the bp and (when \Iq\ ^ 2^^'^ ) 
obey the bounds 

I«qI + Wq\ + Wq\ + Wq\ ^ ll/lloo- (81) 

A similar statement holds with b\, and T6p replaced by 6p and T*b^p (but the sets Ti, 
Pbuffer o,nd T arc different then). 

The tree Ti represents the "good" portion of the tree Tree(P), in which bp is neither 
too large nor too small (so in particular the W''p wavelet system is well-behaved on Ti). 
The buffer tiles Pbuffer are those tiles immediately above Ti (and are thus slightly less 
"good"), while the remaining trees T have no good properties at all, except that they 
only occupy at most [l — e) of the tree Tree(P). This decomposition shares many features 
in common with Lemma p.7| (for instance, the trees T are formed from those intervals 
where b is too "heavy" or too "light"). 

In terms of the phase plane (but adapted to the VF^i wavelet system instead of the Haar 
wavelet system), one can interpret the right-hand side of (|80D as follows. The first term 
corresponds to the region of phase space below the tree Ti. The second term corresponds 
to Ti itself. The third term corresponds to the region above Ti U Pbuffer, while the last 
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term is an error term corresponding to the region P buffer- In the model case 6p = Xip 
(^) simphfies to 

f = [f]pxi,+^Tj+J2^T'f+ E 

T'eX Q(^Pbuffer 



Proof We begin by applying Lemma |6.5| to Tree(P) to find a preliminary collection Tq 
of disjoint convex trees in Tree(P) such that the tops of To are a (1 — 2£:)-packing, and 
such that bp is pseudo-accretive (with constants depending on e) on the tree 

T2:=Tree(P)\ |J T'. 



T'GT, 







However we do not yet have (|79D . To obtain these bounds we let Q denote the set of all 
tiles Q G T2 for which 

and which are maximal with respect to <'. If the constant C is chosen large enough, then 
Q is a e-packing of Tree(P). Thus if we define 

T := To U U (Tree(g) n T2) 

<96Q 



then we see that (p9|) holds on the tree 

T3 := Tree(P)\ |J T', 

T'GT 

while the tops of T are still a (1 — £:)-packing of Tree(P). 

We now perform one minor modification to T to make T sibling-free. If T contains 
two trees whose tops Pt', Pt" are siblings, we can concatenate these trees and add a new 
tile 2Pt' = 2Pt" to join these trees to a larger tree without affecting the (1 — e)-packing 
nature of the tree tops. Repeating this process as often as necessary (it must terminate 
since Tree(P) only has a finite number of tiles) we can make T sibling-free. 

For similar reasons we may assume that the trees0 in T are complete, since we can 
always replace an incomplete tree by the completion of that tree, absorbing any sub-trees 
that were also in T if necessary. 

We now defin^ P buffer to be the set of tiles Q in T3 such that one or bothQ of the 
children Qi, of Q are not in T3. Since T3 is a convex tree, the children of Q who are not 
in T3 must have disjoint spatial supports as Q varies in Pbuffer- This implies that P buffer 
is a 2-packing. We now set Ti := T^\P buffer- Note that all children of tiles in Ti lie in 
Ti U Pbuffer so that 6p is strongly pseudo-accretive on Ti, but is merely pseudo-accretive 

on Ti U P buffer - 

^"Alternatively, we could avoid these modifications by combining the stopping time argument here 
with the one in Lemma |6.5|. 
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The algorithm here is extremely similar to the one used to prove Theorem 3.5. Indeed, one can 



even re-use Figure 3.L The trees Tq are the "light" trees where bp has too small a mean; the tiles Q 
correspond to the circled "heavy" tiles, where bp or T&p has too large an norm. The tiles Pbuffer 
are thus the buffer tiles, which are the ones just below the heavy or light tiles, as well as the tiles at the 
very finest scale. 

^^Of course, because we made T sibling-free, the only way both the children of Q fail to be in T3 is if 
Q is at the finest scale, i.e. if \Iq \ = 2^*^. 
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The only property left to verify is the decomposition (|80D . If / is a constant multiple 
of 6p then only the first term is non-zero (thanks to (|66D ) and the claim is easily verified. 
By subtracting off a constant multiple we may thus assume that / has mean zero on I p. 

It will suffice to prove the identity assuming that 

[b\>]Q. [&p]qp [&p]q,. 7^ for all Q E Tree(P), 

since the general case then follows by an obvious limiting argument (the bounds (0) will 
not depend quantitatively on the above condition). In this case (|70|) applies^. Comparing 
this with (|80D and using the mean zero condition, we reduce to showing that 

QeP6n//er T'GTQ'GT' T'GT Q&Pbuffer 

for suitable (pQ. 

Let Q G P buffer- First suppose that neither child of Q is a top of a tree in T (since 
Tree(P) is complete, this can only happen when \Iq\ = 2~^^). In this case we simply set 

Now suppose that one child of Q is a top of a tree T' in T; without loss of generality 
we assume Qi is such a top. Since T is sibling-free, Qr is not a top and must therefore lie 
in Ti U Pbuffer- In particular we have the lower bounds 

\KWI\K]q\>1. (82) 

We do not have good lower bounds on |[fcp]Qj, but fortunately we can take advantage of 
some "wiggle room" in the buffer, and avoid using this in our computations by exploiting 
the identity 

Wbl,f{Q)^PQ^ + E Wbrj{Q')4', = E Wb\,fiQ')4" - E w>^i,fiQ')4"- 

Q'&T' Q/gTree{Q) Q'eTree(Q,.) 

The function XlQ'GTree(Q) ^h\,f{Q')'^Qi clearly is supported on Iq and has mean zero, 
while 

/- E w^6^/(Q')4= E w^^>^/w')4 

Q'eTree(Q) Q'GTree(P)\Tree(Q) 

is a constant multiple of h\, on /t'. Thus we have 



Q'eTree((3) 



P\Q 



Similarly we have 



Q'eTree{Q,) ^ 'PJ'^r 

Subtracting the two we thus see that 

qT^, ["p\q ["plQr 



^^One can easily verify (either by a dimension counting argument, or by inductively working from the 

Q 



finest scale upwards) that the wavelets i^q' for Q £ Tree(P) span So{P) 



CARLESON MEASURES, TREES, EXTRAPOLATION, AND T(6) THEOREMS 



45 



If we thus define 

[Op\Q [bp\Q,. 

we see that (|8T[) follows; the mean zero condition can be seen by (^) and inspection. 
This completes the proof of Lemma |6.11j . ■ 



Roughly speaking, the above lemma states that we can find a large tree Ti on which 
bp is pseudo-accretive, on which bp and Tbp are effectively bounded, and for which we 
have a representation of the form (|7D]). We now run an argument in the spirit of Lemma 



371| to localize matters exclusively to this tree Ti. 

We apply the above Lemma first with the 6p. We decompose / using (pO]), thus esti- 
mating the left-hand side of (|7Sp by the sum of the term below Ti 

[/]pT6^|, (83) 



the terms coming from Ti 
the terms above Ti U ^buffer 
and the term from buffer 



i5) 



The contribution of ( P5D is 0(|/p| ||/||oo) by ( [7B[ ) and Holder. For the contribution of 
(P^), we observe from ( |72D that 

QeTi 

By Cauchy-Schwarz it will thus suffice to show the bound 

'Ip 

or equivalently that 



lll(T*X/p,<Onisizc(T,)<l- (87) 
One can think of ( pTf ) as a localized, 6p-adapted version of the statement T*(l) G BMO. 



We will defer the proof of (|87|) to Step 3 of the argument. Assuming the bound (|87D for 
now, we move on to (^5]). Observe that the expression inside the T() in (|85D is supported 
on It' and has mean zero, so we may reduce the integral from Ip to It'- From the 
definition of A we have 

T(/x7,,)|<A|/t. 
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while from (^6|) and Cauchy-Schwarz we have 

I / T([/]p^,6^^,)|<|J^,|||/|U. 

J Irpf 

Adding all this up and using the fact that the tops of T are a (1 — £:) packing we can 
bound (H) by ((1 - e)A + 0(l))|/p| as desired. 

Finally we consider (|86D . Since G Sq{Iq) we can estimate this term by 



-■D J lo 



Q&^buffer "3 



If =2 then this vanishes because we truncated the operator T, so we assume that 
|/q| > 2~^^ . Using Cauchy-Schwarz, (|TBD, (^Tj), and Corollary |6.10| we can bound this by 

0{ E |/gi 



Q bu f f er 

which is acceptable since Pbuffer is a 2-packing. 
Step 3. Pruning the bad tiles of bp. 

In Step 2 we reduced the proof of (fTSf) (and thus of Theorem to that of proving 



the Carleson measure estimate {p7\j- Along the way we managed to prune all the tiles 
for which b\> was "bad" (in that the mean of b\> was too small, or the norm of 6p or 
Tbp was too large). However, 6p is still not under control. Thus the next step shall be to 
prune 6p. 

We define ^ 

B ■■= |||(T*X/p,^Q')rilsize*(Ti); 

it thus suffices to prove that B = 0(1). In fact will suffice to show that 

E \Txi,,4)\"<((^-^)B + 0{l))\Ip,\ 

QeTinTree(P') 

for all tiles P' <' P, since the claim then follows by taking suprema over P' and solving 
for B (cf. Lemma 

Fix P'. We apply Lemma [6.11| again but with the 6p,, partitioning 

Tree(P') = U P^^^,, U (J T'. 

T'GT' 

From the definition of B and the fact that T' is a (1 — £:) -packing we have (cf. Lemma 



3.1 



E \{T*Xi,,4n\'<il-e)B\Ip, 
QeTinUr/gT' ^' 
so it suffices to show that 

E i(T*x/.,<hP<i/p'i- 

QeTin(T2UP;,„^^^J 
From (p9| ) it will suffice to show 

E \{T^*Xi,,b],4)\'<\Ip,\. 
Qerin(T2UP',„^^^J 
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We first observe that 

KT*X/P,&^4)| = I / T(6^0Qhl = l/ T(6},4)| < |Jq|V2||T(6},4)I|2; 
Jip Jiq 

from Corollary |6.1CI| and ([79|) we thus have the weak Carleson bound 

\{T*Xi,,b\,4)\'<\lQ\ 

for all Q E Ti. In particular we see that the contribution of P'f,„jjg^ will be acceptable 
since P'huffer is ^ 2-packing. We are thus left with showing 

QGTinT2 

In Step 4 we shall show the pointwise bound 

\{T*Xi„b]>4)\ ^ \WbUQ)\ + \W,^^ib].T*ibl,)){Q)\ + \W,^^{bl,^{b].)){Q)\ + \W,^^{T{b],)){Q)\ 

(88) 

for all Q G Ti n T2. The claim will then follow from (|73D|3. 
Step 4. Pointwise estimates on wavelet coefficients. 



In the previous step we reduced matters to proving the pointwise estimate (|88|). The 
proof of this estimate is really the core of our argument, although it was necessary to do 
all the above prunings to get to a point where this estimate became both provable and 
usefuig. 

We now prove (R3). Fix Q G Ti fl T2. On Iq, we can decompose xip = jj^] — I" ^ where 

F is the mean zero function F := xiq \b\]Q • ^^^^^ T(6p0q') is supported on Jq, we 

can thus estimate the left-hand side of (p8| ) by 

\('^*jS^^b].<p'^n\ + \{T*F,b\.4)\. 
[bp'lQ 

The first term is |0(M4i (6pT*(6p,))((5))| by the pseudo-accretivity of 6p,. For the second 
term, observe that if the (pQ could be moved inside the T*, thus 

l(T*(4^)'^p)l 

then by moving the T* to the other side, we could bound this by 

o{\w,^^{F^{bm = o{\w,r^{bi,T{b\,))m) + o{\w,r^{Tmm) 

again using the pseudo-accretivity of bp,. Thus it suffices to control the commutator 

|(T*F,&),0g')-(T*(4F),6^)|. 



^^While the tree Ti fl T2 is not necessarily convex, the larger tree (Ti U Pbuffer) H {T2 U P'huffer) 
and on this larger tree we have pseudo-accretivity and ( [79| ) for both bp and bp, . 

^^If one wished to prove a global T(&) theorem, with globally para-accretive L°° functions bp, 5p, one 
could dispense with the selection algorithms and go directly to (a suitable analogue of) (^8|). We omit 
the details. 
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If F had mean zero on both Jq^ and Iq^ then this commutator would be zero from (|53|) 
^1 

since (pQ is constant on Iq^ and Iq^. Thus we may freely replace F by [F]Q^{XIQ^ ~ Xiq^) 
since the difference has mean zero on both Iq^ and Iq^. Throwing the T* on to the other 
side and using Cauchy-Schwarz and Corollary |6.1CI| , we can thus bound this commutator 
by 

However a computation shows that 

\Qi = \Qi - \C ^ = = -l/^l 



Q 



and the claim then follows by the pseudo-accretivity of 6p, . The proof of Theorem |0 
now complete. 



IS 



One can generalize (^) to 

/ \bl.\P + iTbU"' + Kl" + \T*bl\P' < \h 
Jip 



Up 

for any I < p,q < oo, with 1/p + 1/p' = 1/q + 1/q' = 1 (this was already suspected 
in when p = q = oo); the dual exponents are necessary to control such expressions 
as (T6p,6p). Most of the argument proceeds mutatis mutandis except for Lemma |6?7. 



Firstly in (^) the |6p mean has to be replaced by some other mean, but because of 
Corollary ^]3| we still recover ( |7^ . However we still must modify ( |72D , ([73| ) to 



mean* (T) 



PeT 



and 



mean* (T) ' 



This proceeds by replacing / and b'f with averaged variants as in the proof of (^); the 
averages will then be controlled in L°° and hence in L^. We omit the details. 

It is also straightforward to generalize Theorem |6.8| to Calderon-Zygmund operators 
which do not obey the perfect dyadic cancellation condition (|53|), and instead obey a 
more classical cancellation condition such as \VxK{x,y)\ + | Vj^-ft'(a;, ?/)| < l/\x — y\^. 
However it is still convenient to impose a truncation condition on the kernel when |x — ?/| 
is extremely small or extremely large (as in e.g. JTT|). 



The main new difference with these kernels is that when // G So{I), the function 
Tfi is no longer supported in I but has a tail at infinity. However the cancellation 
conditions ensure that this tail is quite rapidly decaying (like if we assume the 

above gradient bounds). This causes many of the identities used in the arguments above 
to pick up some error terms, for instance if g is constant on / it is no longer true that 
{Tfj,g) = [g]j{Tfi,l), however the error term incurred is quite manageable due to the 
good decay (especially if g is in fact constant on a much wider interval than /). We will 
not pursue the details further here as they are rather standard (see e.g. [0, fill). A 
perhaps more interesting generalization would be to non-doubling situations as in as 
this may have applications to analytic capacity problems, but this seems to require much 
more technical arguments. 
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As a corollary of the local T(6) theorem we can conclude a global T(6) theorem]^ . We 
recall that a function b is para- accretive if for every tile P there exists a tile Q <' P with 
\Iq\ ~ \Ip\ and |[&]q| > 1. Every pseudo-accretive function is para-accretive (just take 
Q := P), but not conversely. 

Corollary 6.12 (Dyadic global T(6) theorem). Lei 61,62 be para- accretive functions with 

||6i||bmo7 II62IIBA/0 ^ 1- 

Let T be a perfect dyadic C alder on- Zygmund operator obeying the modified weak bound- 
edness property 

|(T6iX7-, 62Xj)| ^ l-^l for all intervals /, J, K with \I\ ~ | J| ~ \K\ and I, J K 

(89) 

and such that 

l|T(6i)|| 

Then T zs bounded on . 



Proof We apply Theorem with 



[bih 



where for each P, we choose the tile Q <' P so that \Iq\ ~ \Ip\ and |[6i]q| > 1. We define 
6p similarly. 

The normalization (^) is clear. To prove ([76|) , we observe that 



,1 _ (^1 - [bi]Q)xiQ , ^ 
[bilQ 

and so the bound on bp follows from the BMO control on 61 and the lower bound on 
|[6i]q|. To control T6p, it suffices from (|5^ ) and the lower bound on |[6i]q| to show that 



l|T(6iX/JIU^(/c,)<l^r/', 

or in other words that 

l(T(6iX/J,/^)|<|/Qr/^||/^||2 

for all h G S{Iq). 

Select a tile R <' Q such that |[62]i?| > 1 and \Ir\ ~ |/q|. If /i is a scalar multiple of 
b2XRi the claim follows from (^). Thus we may subtract multiples of 62XR and reduce to 
the case when h has mean zero (cf. Lemma |6.9| ). But then (^{biXiq)-, h) = (T61, h), and 
the claim follows since T61 G BMO. 

One can control 6p and T*6p by identical arguments. ■ 



^^Of course, the global T(6) theorem could be proven directly in a much simpler manner, but one 
advantage of doing things this way is that we can relax the hypotheses of the global T(6) theorem slightly. 
If &i, 62 were bounded and strongly pseudo-accretive, one could obtain a direct proof of the global T(6) 
theorem by using the modified wavelet transforms Wt^ , to define paraproducts by adapting ( p4|) , 
and then finding an analogue of (^), and repeating the proof of the global T(l) theorem. See e.g. ^ 
for details. An alternate approach based on (RSh is also possible. 
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Notice that 61, 62 are only assumed to be in BMO]^ rather than L°°. This generahzation 
of the standard T(6) theorem appears to be new. Also observe that the above argument 
also works in the special case T(6i) = T*{b2) = if we drop the para-accretivity and 
BMO hypotheses on bi, 62 and instead impose the reverse Holder conditions 



on bi, 62- It is in fact likely that we can obtain a T(6)-type theorem for arbitrary (complex) 
dyadic A^o weights 61, 62 (see [^), but we will not attempt to give the most general 
statements here. 
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